From dholmes at openjdk.java.net Mon Nov 1 02:12:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 1 Nov 2021 02:12:12 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence In-Reply-To: References: Message-ID: <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com> On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev wrote: > `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. > > This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054: > > > Benchmark Mode Cnt Score Error Units > > # Default > Single.acquire avgt 3 0.407 ? 0.060 ns/op > Single.full avgt 3 4.693 ? 0.005 ns/op > Single.loadLoad avgt 3 0.415 ? 0.095 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 0.408 ? 0.047 ns/op > Single.storeStore avgt 3 0.408 ? 0.043 ns/op > > # -XX:DisableIntrinsic=_storeFence > Single.acquire avgt 3 0.408 ? 0.016 ns/op > Single.full avgt 3 4.694 ? 0.002 ns/op > Single.loadLoad avgt 3 0.406 ? 0.002 ns/op > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 4.694 ? 0.003 ns/op <--- upgraded to full > Single.storeStore avgt 3 4.690 ? 0.005 ns/op <--- upgraded to full > > # -XX:DisableIntrinsic=_loadFence > Single.acquire avgt 3 4.691 ? 0.001 ns/op <--- upgraded to full > Single.full avgt 3 4.693 ? 0.009 ns/op > Single.loadLoad avgt 3 4.693 ? 0.013 ns/op <--- upgraded to full > Single.plain avgt 3 0.408 ? 0.072 ns/op > Single.release avgt 3 0.415 ? 0.016 ns/op > Single.storeStore avgt 3 0.416 ? 0.041 ns/op > > # -XX:DisableIntrinsic=_fullFence > Single.acquire avgt 3 0.406 ? 0.014 ns/op > Single.full avgt 3 15.836 ? 0.151 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.426 ? 0.361 ns/op > Single.release avgt 3 0.407 ? 0.021 ns/op > Single.storeStore avgt 3 0.410 ? 0.061 ns/op > > # -XX:DisableIntrinsic=_fullFence,_loadFence > Single.acquire avgt 3 15.822 ? 0.282 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.851 ? 0.127 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.829 ? 0.045 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 0.414 ? 0.156 ns/op > Single.storeStore avgt 3 0.422 ? 0.452 ns/op > > # -XX:DisableIntrinsic=_fullFence,_storeFence > Single.acquire avgt 3 0.407 ? 0.016 ns/op > Single.full avgt 3 15.347 ? 6.783 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 15.828 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.834 ? 0.045 ns/op <--- upgraded, calls runtime > > # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence > Single.acquire avgt 3 15.838 ? 0.030 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.854 ? 0.277 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.826 ? 0.160 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.003 ns/op > Single.release avgt 3 15.838 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.844 ? 0.104 ns/op <--- upgraded, calls runtime > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` src/hotspot/share/classfile/vmIntrinsics.hpp line 526: > 524: do_name( storeFence_name, "storeFence") \ > 525: do_alias( storeFence_signature, void_method_signature) \ > 526: do_intrinsic(_fullFence, jdk_internal_misc_Unsafe, fullFence_name, fullFence_signature, F_R) \ Why did you drop the N from F_RN? AFAICS the fullFence method is still native. ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From dholmes at openjdk.java.net Mon Nov 1 02:18:13 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 1 Nov 2021 02:18:13 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence In-Reply-To: References: Message-ID: <2qd34U1LfATnTafN9vzz8PJx-AjhPv_o_GiKkpWN9MM=.ea3d89b4-a7ec-4993-98f7-6a42139c1796@github.com> On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev wrote: > `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. > > This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054: > > > Benchmark Mode Cnt Score Error Units > > # Default > Single.acquire avgt 3 0.407 ? 0.060 ns/op > Single.full avgt 3 4.693 ? 0.005 ns/op > Single.loadLoad avgt 3 0.415 ? 0.095 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 0.408 ? 0.047 ns/op > Single.storeStore avgt 3 0.408 ? 0.043 ns/op > > # -XX:DisableIntrinsic=_storeFence > Single.acquire avgt 3 0.408 ? 0.016 ns/op > Single.full avgt 3 4.694 ? 0.002 ns/op > Single.loadLoad avgt 3 0.406 ? 0.002 ns/op > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 4.694 ? 0.003 ns/op <--- upgraded to full > Single.storeStore avgt 3 4.690 ? 0.005 ns/op <--- upgraded to full > > # -XX:DisableIntrinsic=_loadFence > Single.acquire avgt 3 4.691 ? 0.001 ns/op <--- upgraded to full > Single.full avgt 3 4.693 ? 0.009 ns/op > Single.loadLoad avgt 3 4.693 ? 0.013 ns/op <--- upgraded to full > Single.plain avgt 3 0.408 ? 0.072 ns/op > Single.release avgt 3 0.415 ? 0.016 ns/op > Single.storeStore avgt 3 0.416 ? 0.041 ns/op > > # -XX:DisableIntrinsic=_fullFence > Single.acquire avgt 3 0.406 ? 0.014 ns/op > Single.full avgt 3 15.836 ? 0.151 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.426 ? 0.361 ns/op > Single.release avgt 3 0.407 ? 0.021 ns/op > Single.storeStore avgt 3 0.410 ? 0.061 ns/op > > # -XX:DisableIntrinsic=_fullFence,_loadFence > Single.acquire avgt 3 15.822 ? 0.282 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.851 ? 0.127 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.829 ? 0.045 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 0.414 ? 0.156 ns/op > Single.storeStore avgt 3 0.422 ? 0.452 ns/op > > # -XX:DisableIntrinsic=_fullFence,_storeFence > Single.acquire avgt 3 0.407 ? 0.016 ns/op > Single.full avgt 3 15.347 ? 6.783 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 15.828 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.834 ? 0.045 ns/op <--- upgraded, calls runtime > > # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence > Single.acquire avgt 3 15.838 ? 0.030 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.854 ? 0.277 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.826 ? 0.160 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.003 ns/op > Single.release avgt 3 15.838 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.844 ? 0.104 ns/op <--- upgraded, calls runtime > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` I'm not quite seeing the motivation here. Your claim is that the non-intrinsic implementations involve a native call and so that is too expensive; yet the new code still relies on the fullFence being intrinsified else it is still a native call and a heavier barrier. If these fences were intrinisified piecemeal then perhaps this is an issue on some platform, but is that really the case? If you intrinsified one wouldn't you intrinsify all? ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From ngasson at openjdk.java.net Mon Nov 1 04:12:12 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 1 Nov 2021 04:12:12 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates [v4] In-Reply-To: <6RXxK49iDwBKpqVZar9-4B1AO5z6lLagcjLVBHT5sKo=.a274ceff-4c76-4cad-a9e7-f5f7f148ea83@github.com> References: <6RXxK49iDwBKpqVZar9-4B1AO5z6lLagcjLVBHT5sKo=.a274ceff-4c76-4cad-a9e7-f5f7f148ea83@github.com> Message-ID: On Fri, 29 Oct 2021 09:24:47 GMT, Fei Gao wrote: >> for(int i = 0; i < LENGTH; i++) { >> c[i] = a[i] + 2; >> } >> >> For the case showed above, after superword optimization with SVE, >> without the patch, the vector add operation always has 2 z-reg inputs, >> like: >> mov z16.s, #2 >> add z17.s, z17.s, z16.s >> >> Considering sve has supported basic binary operations with immediate, >> this pattern could be further optimized to: >> add z16.s, z16.s, #2 >> >> To implement it, we added some new match rules and assembler rules in >> the aarch64 backend. We also made some extensions on immediate types >> and functions to keep backward compatible. >> >> With the patch, only these binary integer vector operations, +(add), >> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for >> the optimization. Other vector operations are not supported currently. >> >> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 >> CPU, no new failure. >> >> There is no obvious performance uplift but it can help remove one >> redundant mov instruction. > > Fei Gao has updated the pull request incrementally with one additional commit since the last revision: > > Add some assertion lines for help functions > > Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6115 From shade at openjdk.java.net Mon Nov 1 07:36:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 07:36:57 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2] In-Reply-To: <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com> References: <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com> Message-ID: On Mon, 1 Nov 2021 02:09:19 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Restore RN for fullFence > > src/hotspot/share/classfile/vmIntrinsics.hpp line 526: > >> 524: do_name( storeFence_name, "storeFence") \ >> 525: do_alias( storeFence_signature, void_method_signature) \ >> 526: do_intrinsic(_fullFence, jdk_internal_misc_Unsafe, fullFence_name, fullFence_signature, F_R) \ > > Why did you drop the N from F_RN? AFAICS the fullFence method is still native. Good spot! That's indeed incorrect, fixed in new commit. I am surprised `CheckIntrinsics` did not found this discrepancy. I believe "native" flags are not checked at all? For example, existing `_hashCode` intrinsic is also `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef up those checks separately. ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From shade at openjdk.java.net Mon Nov 1 07:36:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 07:36:53 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2] In-Reply-To: References: Message-ID: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com> > `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. > > This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054: > > > Benchmark Mode Cnt Score Error Units > > # Default > Single.acquire avgt 3 0.407 ? 0.060 ns/op > Single.full avgt 3 4.693 ? 0.005 ns/op > Single.loadLoad avgt 3 0.415 ? 0.095 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 0.408 ? 0.047 ns/op > Single.storeStore avgt 3 0.408 ? 0.043 ns/op > > # -XX:DisableIntrinsic=_storeFence > Single.acquire avgt 3 0.408 ? 0.016 ns/op > Single.full avgt 3 4.694 ? 0.002 ns/op > Single.loadLoad avgt 3 0.406 ? 0.002 ns/op > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 4.694 ? 0.003 ns/op <--- upgraded to full > Single.storeStore avgt 3 4.690 ? 0.005 ns/op <--- upgraded to full > > # -XX:DisableIntrinsic=_loadFence > Single.acquire avgt 3 4.691 ? 0.001 ns/op <--- upgraded to full > Single.full avgt 3 4.693 ? 0.009 ns/op > Single.loadLoad avgt 3 4.693 ? 0.013 ns/op <--- upgraded to full > Single.plain avgt 3 0.408 ? 0.072 ns/op > Single.release avgt 3 0.415 ? 0.016 ns/op > Single.storeStore avgt 3 0.416 ? 0.041 ns/op > > # -XX:DisableIntrinsic=_fullFence > Single.acquire avgt 3 0.406 ? 0.014 ns/op > Single.full avgt 3 15.836 ? 0.151 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.426 ? 0.361 ns/op > Single.release avgt 3 0.407 ? 0.021 ns/op > Single.storeStore avgt 3 0.410 ? 0.061 ns/op > > # -XX:DisableIntrinsic=_fullFence,_loadFence > Single.acquire avgt 3 15.822 ? 0.282 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.851 ? 0.127 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.829 ? 0.045 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 0.414 ? 0.156 ns/op > Single.storeStore avgt 3 0.422 ? 0.452 ns/op > > # -XX:DisableIntrinsic=_fullFence,_storeFence > Single.acquire avgt 3 0.407 ? 0.016 ns/op > Single.full avgt 3 15.347 ? 6.783 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 15.828 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.834 ? 0.045 ns/op <--- upgraded, calls runtime > > # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence > Single.acquire avgt 3 15.838 ? 0.030 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.854 ? 0.277 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.826 ? 0.160 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.003 ns/op > Single.release avgt 3 15.838 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.844 ? 0.104 ns/op <--- upgraded, calls runtime > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Restore RN for fullFence ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6149/files - new: https://git.openjdk.java.net/jdk/pull/6149/files/e2c623be..a0fd03ee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6149&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6149&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6149.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6149/head:pull/6149 PR: https://git.openjdk.java.net/jdk/pull/6149 From shade at openjdk.java.net Mon Nov 1 08:18:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 08:18:17 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence In-Reply-To: <2qd34U1LfATnTafN9vzz8PJx-AjhPv_o_GiKkpWN9MM=.ea3d89b4-a7ec-4993-98f7-6a42139c1796@github.com> References: <2qd34U1LfATnTafN9vzz8PJx-AjhPv_o_GiKkpWN9MM=.ea3d89b4-a7ec-4993-98f7-6a42139c1796@github.com> Message-ID: <766OQW0EKB1-XFSKGDvYLBFPP_I0Kxwj_dI84d1RoeE=.b2da8f04-9fc5-401c-afe5-9f763d130f65@github.com> On Mon, 1 Nov 2021 02:15:04 GMT, David Holmes wrote: > I'm not quite seeing the motivation here. Your claim is that the non-intrinsic implementations involve a native call and so that is too expensive; yet the new code still relies on the fullFence being intrinsified else it is still a native call and a heavier barrier. If these fences were intrinisified piecemeal then perhaps this is an issue on some platform, but is that really the case? If you intrinsified one wouldn't you intrinsify all? Yes, that was not clear, sorry. For current platforms, it is mostly a maintenance cleanup to shrink the unnecessary Unsafe interfaces: if we disable the `acquireFence` intrinsic, we don't need to call into native fallback (which would be excessive), instead we can just go to Java-level fallback (which would also be faster). I am looking at the cases where we would like to only intrinsify `fullFence`, for example for Zero interpreter. Instead of handling all three flavors of fences, we can get the majority of performance win by only drilling the interpreter-entry-intrinsic hole for `fullFence`, and let everything else handled at Java level. ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From shade at openjdk.java.net Mon Nov 1 09:16:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 09:16:09 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > This is another instance of counter updates that only need atomic guarantee. (I am not arguing in favor or against this particular change, but I think we can talk a bit about generic stuff here...) > I don't know where this guarantee is coming from. Two r-m-w atomic ops must have some guarantee via coherence for the atomic op to actually work. And an implementation could make any atomic r-m-w implementation ensure global immediate visibility. But you cannot assume this is guaranteed for all hardware. Even for a given platform this would need to be a specified guarantee in the architecture manual, not just something deduced/inferred by reasoning. Hotspot's `memory_order_relaxed` is [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) with C++11 atomics semantics. C++11 atomic semantics for relaxed atomic ops requires [single modification order consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), which implies [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order). All known hardware platforms provide coherence out of the box (they are, indeed, cache-coherent platforms), that's why it is easy to implement in C++ (`mo_relaxed`) and in Java (`VarHandles.(get|set)opaque`). I am always confused by "immediate global visibility". The problem with statements that include "immediate", "before", "after" is that they leak in the notion of time, which is ill-defined for a single memory location without any reference to other variables. Maybe you can expand your concern with the example? ------------- PR: https://git.openjdk.java.net/jdk/pull/6065 From stefank at openjdk.java.net Mon Nov 1 09:31:14 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 1 Nov 2021 09:31:14 GMT Subject: RFR: 8275527: Refactor forward pointer access [v4] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 12:35:37 GMT, Roman Kennke wrote: >> Accessing the forward pointer is currently a little inconsistent. Some code paths call oopDesc::forwardee() / oopDesc::is_forwarded(), some code paths call forwardee() and check it for ==/!= NULL, some code paths even call markWord::decode_pointer() and markWord::is_marked() instead. >> >> This change attempts to make the situation more consistent. For simple cases it preserves oopDesc::forwardee() / is_forwarded(), some cases need to use the markWord for consistency in concurrent GC, they now use markWord::forwardee() and markWord::is_forwarded(). Also, checking whether or not an object is forwarded is now consistently done using is_forwarded() and not by checking forwardee ==/!= NULL. This also resolves the mess in G1 full GC that changes not-forwarded objects to have a NULL (fake-) pointer. This is not necessary, because we can just as well use the lock bits to determine whether or not the object is forwarded. >> >> Testing: >> - [x] tier >> - [x] tier2 >> - [x] hotspot_gc > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Move forward impl into markWord and add assert Thanks for doing this change. This looks good to me. I've added a comment below that I think would be nice to get resolved somehow, though I don't need to re-review if you update with any of the suggestions. src/hotspot/share/oops/markWord.hpp line 253: > 251: return cast_to_oop(decode_pointer()); > 252: } > 253: }; This brings the forwarded/forwardee terminology into the markWord. The markWord was previously decoupled from those to concepts. I would personally let those function names stay in oopDesc and not leak down into the markWord. If you do want to keep it here, could you update the comments at the top that describes the bits? // [ptr | 11] marked used to mark an object ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5955 From aph at openjdk.java.net Mon Nov 1 10:44:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 1 Nov 2021 10:44:10 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: On Sun, 31 Oct 2021 11:53:36 GMT, Andrew Haley wrote: > > We had internal discussion on this topic, Aleksey pointed out: "All modifications to any particular atomic variable occur in a total order that is specific to this one atomic variable". This guarantee holds even for relaxed atomic load/stores. This is a very basic guarantee. > > I think that's true for most processors as a consequence of multi-copy atomicity, but we support Power which is not multi-copy atomic, where stores can become visible to one group of threads before they become visible to all threads. Sorry, this was something of a red herring. My main point: imposing ordering with respect to other memory accesses around a counter increment does nothing useful unless you care about the ordering of the increment with respect to those other accesses, which AFAICS you don't in this case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6065 From shade at openjdk.java.net Mon Nov 1 11:34:26 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 11:34:26 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling Message-ID: This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. There seem to be no performance regressions with this patch at least on Linux x86_64: $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" Benchmark Mode Cnt Score Error Units ### Before StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms ### After StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms Additional testing: - [x] `StrictMath` benchmarks - [x] Linux x86_64 fastdebug `tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6184/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276217 Stats: 66 lines in 16 files changed: 27 ins; 26 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/6184.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6184/head:pull/6184 PR: https://git.openjdk.java.net/jdk/pull/6184 From mcimadamore at openjdk.java.net Mon Nov 1 12:05:32 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 1 Nov 2021 12:05:32 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v10] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Add cache for memory address var handles - Merge branch 'master' into JEP-419 - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm) - Merge branch 'master' into JEP-419 - Fix copyright header in TestArrayCopy - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!) - * use `invokeWithArguments` to simplify new test - Add test for liveness check with high-aririty downcalls (make sure that if an exception occurs in a downcall because of liveness, ref count of other resources are left intact). - * Fix javadoc issue in VaList * Fix bug in concurrent logic for shared scope acquire - Address review comments - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343 ------------- Changes: https://git.openjdk.java.net/jdk/pull/5907/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=09 Stats: 14497 lines in 189 files changed: 6773 ins; 5149 del; 2575 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From shade at openjdk.java.net Mon Nov 1 12:27:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 12:27:13 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2] In-Reply-To: References: <4DdPvz6BaeM-ekZb2BB51hbEubCZch8CmnzwjBfE4Wo=.cb94614d-8d67-4bf9-868c-0c2e04d1befe@github.com> Message-ID: On Mon, 1 Nov 2021 07:32:18 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/classfile/vmIntrinsics.hpp line 526: >> >>> 524: do_name( storeFence_name, "storeFence") \ >>> 525: do_alias( storeFence_signature, void_method_signature) \ >>> 526: do_intrinsic(_fullFence, jdk_internal_misc_Unsafe, fullFence_name, fullFence_signature, F_R) \ >> >> Why did you drop the N from F_RN? AFAICS the fullFence method is still native. > > Good spot! That's indeed incorrect, fixed in new commit. I am surprised `CheckIntrinsics` did not found this discrepancy. I believe "native" flags are not checked at all? For example, existing `_hashCode` intrinsic is also `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef up those checks separately. This `CheckIntrinsics` oddity is handled by #6187. ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From duke at openjdk.java.net Mon Nov 1 12:46:12 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 1 Nov 2021 12:46:12 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v10] In-Reply-To: References: <8P-tWT-7UC9TMLz8zo5liDy2rOONBU864RUlRhthLeY=.05ef40f6-38ca-4e84-a54b-eb8dbae2b97f@github.com> Message-ID: On Fri, 15 Oct 2021 13:09:27 GMT, Andrew Haley wrote: >> Can we have a simple (as simple as possible) JMH benchmark, please? It should be something like a couple of threads racing to count up to a million. > >> @theRealAph, any comments on the microbenchmark I wrote? > > Something like this works well: > > > @Param({"1000000"}) > public int maxNum; > > @Param({"4"}) > public int threadCount; > > AtomicInteger theCounter; > > Thread threads[]; > > void work() { > for (;;) { > int prev = theCounter.get(); > if (prev >= maxNum) { > break; > } > if (theCounter.compareAndExchange(prev, prev + 1) != prev) { > Thread.onSpinWait(); > } > } > } > > @Setup(Level.Trial) > public void foo() { > theCounter = new AtomicInteger(); > } > > @Setup(Level.Invocation) > public void setup() { > theCounter.set(0); > threads = new Thread[threadCount]; > > for (int i = 0; i< threads.length; i++) { > threads[i] = new Thread(this::work); > } > > } > > @Benchmark > public void trial() throws Exception { > for (int i = 0; i< threads.length; i++) { > threads[i].start(); > } > for (int i = 0; i< threads.length; i++) { > threads[i].join(); > } > } > } > > Before: > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units > ThreadOnSpinWait.trial 1000000 2 avgt 3 43.830 ? 32.543 ms/op > > With `-XX:OnSpinWaitInst=isb -XX:OnSpinWaitInstCount=4` > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units > ThreadOnSpinWait.trial 1000000 2 avgt 3 22.181 ? 11.592 ms/op > > With `-XX:OnSpinWaitInst=isb -XX:OnSpinWaitInstCount=1` > > Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units > ThreadOnSpinWait.trial 1000000 2 avgt 3 36.281 ? 31.700 ms/op > > > This is Apple M1, where you have to be very careful because there's some processor > frequency scaling going on. Hi @theRealAph, I see there are no other comments. Can I proceed to integrate? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Mon Nov 1 13:11:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 1 Nov 2021 13:11:08 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 11:23:16 GMT, Aleksey Shipilev wrote: > This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. > > For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. > > There seem to be no performance regressions with this patch at least on Linux x86_64: > > > $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" > > Benchmark Mode Cnt Score Error Units > > ### Before > > StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms > StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms > StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms > StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms > > > StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms > StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms > StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms > StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms > > > StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms > > ### After > > StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms > StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms > StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms > StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms > > StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms > StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms > StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms > StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms > > StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms > > > Additional testing: > - [x] `StrictMath` benchmarks > - [x] Linux x86_64 fastdebug `tier1` So we have _dsqrt and_dsqrt_strict, which must be functionally identical, but we provide both names because they're part of a public API. I think this deserves an explanatory comment in the code. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From aph at openjdk.java.net Mon Nov 1 13:15:14 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 1 Nov 2021 13:15:14 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13] In-Reply-To: References: Message-ID: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com> On Thu, 21 Oct 2021 15:19:47 GMT, Evgeny Astigeevich wrote: >> Looks good. I'm not entirely sure whether this test is truly representative of the real-world cases that people have seen, but if we find out more we can always add another JMH test. > > This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412): > > SNode awaitFulfill(SNode s, boolean timed, long nanos) { > /* > * When a node/thread is about to block, it sets its waiter > * field and then rechecks state at least one more time > * before actually parking, thus covering race vs > * fulfiller noticing that waiter is non-null so should be > * woken. > * > * When invoked by nodes that appear at the point of call > * to be at the head of the stack, calls to park are > * preceded by spins to avoid blocking when producers and > * consumers are arriving very close in time. This can > * happen enough to bother only on multiprocessors. > * > * The order of checks for returning out of main loop > * reflects fact that interrupts have precedence over > * normal returns, which have precedence over > * timeouts. (So, on timeout, one last check for match is > * done before giving up.) Except that calls from untimed > * SynchronousQueue.{poll/offer} don't check interrupts > * and don't wait at all, so are trapped in transfer > * method rather than calling awaitFulfill. > */ > final long deadline = timed ? System.nanoTime() + nanos : 0L; > Thread w = Thread.currentThread(); > int spins = shouldSpin(s) > ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS) > : 0; > for (;;) { > if (w.isInterrupted()) > s.tryCancel(); > SNode m = s.match; > if (m != null) > return m; > if (timed) { > nanos = deadline - System.nanoTime(); > if (nanos <= 0L) { > s.tryCancel(); > continue; > } > } > if (spins > 0) { > Thread.onSpinWait(); > spins = shouldSpin(s) ? (spins - 1) : 0; > } > else if (s.waiter == null) > s.waiter = w; // establish waiter so can park next iter > else if (!timed) > LockSupport.park(this); > else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD) > LockSupport.parkNanos(this, nanos); > } > } > > > I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark. I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From shade at openjdk.java.net Mon Nov 1 15:35:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 15:35:36 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2] In-Reply-To: References: Message-ID: > This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. > > For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. > > There seem to be no performance regressions with this patch at least on Linux x86_64: > > > $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" > > Benchmark Mode Cnt Score Error Units > > ### Before > > StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms > StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms > StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms > StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms > > > StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms > StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms > StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms > StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms > > > StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms > > ### After > > StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms > StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms > StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms > StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms > > StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms > StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms > StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms > StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms > > StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms > > > Additional testing: > - [x] `StrictMath` benchmarks > - [x] Linux x86_64 fastdebug `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Touchups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6184/files - new: https://git.openjdk.java.net/jdk/pull/6184/files/4cd966dc..27202fa4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=00-01 Stats: 8 lines in 3 files changed: 4 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6184.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6184/head:pull/6184 PR: https://git.openjdk.java.net/jdk/pull/6184 From shade at openjdk.java.net Mon Nov 1 15:35:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 15:35:36 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 13:08:05 GMT, Andrew Haley wrote: > So we have _dsqrt and_dsqrt_strict, which must be functionally identical, but we provide both names because they're part of a public API. I think this deserves an explanatory comment in the code. Yes, no problem, added comment near intrinsic definition. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From mcimadamore at openjdk.java.net Mon Nov 1 17:15:55 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 1 Nov 2021 17:15:55 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v11] In-Reply-To: References: Message-ID: <8DLqVOZo6ZXYqntQe91nI4wIKu0_gn0DY-l8MA2rznM=.fdab6f3c-119e-492d-b61c-6314d51cdd58@github.com> > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix liveness issue with loader lookups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/9b519343..17f45861 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=09-10 Stats: 191 lines in 6 files changed: 187 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From shade at openjdk.java.net Mon Nov 1 17:54:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 17:54:09 GMT Subject: RFR: 8252990: Intrinsify Unsafe.storeStoreFence [v2] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:58:48 GMT, Aleksey Shipilev wrote: >> `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. We can teach compilers to map this directly to already existing rules that handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce the special node to differentiate explicit fence and implicit store-store barriers. `storeStoreFence` is usually used to simulate safe `final`-field like constructions in special JDK classes, like `ConstantCallSite` and friends. >> >> Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4): >> >> >> Benchmark Mode Cnt Score Error Units >> Multiple.plain avgt 3 2.669 ? 0.004 ns/op >> Multiple.release avgt 3 16.688 ? 0.057 ns/op >> Multiple.storeStore avgt 3 14.021 ? 0.144 ns/op // Better >> >> MultipleWithLoads.plain avgt 3 4.672 ? 0.053 ns/op >> MultipleWithLoads.release avgt 3 16.689 ? 0.044 ns/op >> MultipleWithLoads.storeStore avgt 3 14.012 ? 0.010 ns/op // Better >> >> MultipleWithStores.plain avgt 3 14.687 ? 0.009 ns/op >> MultipleWithStores.release avgt 3 45.393 ? 0.192 ns/op >> MultipleWithStores.storeStore avgt 3 38.048 ? 0.033 ns/op // Better >> >> Publishing.plain avgt 3 27.079 ? 0.201 ns/op >> Publishing.release avgt 3 27.088 ? 0.241 ns/op >> Publishing.storeStore avgt 3 27.009 ? 0.259 ns/op // Within error, hidden by allocation >> >> Single.plain avgt 3 2.670 ? 0.002 ns/op >> Single.releaseFence avgt 3 6.675 ? 0.001 ns/op >> Single.storeStoreFence avgt 3 8.012 ? 0.027 ns/op // Worse, seems to be ARM32 implementation artifact >> >> >> The same thing on AArch64 (Raspberry Pi 3): >> >> >> Benchmark Mode Cnt Score Error Units >> >> Multiple.plain avgt 3 5.914 ? 0.115 ns/op >> Multiple.release avgt 3 10.149 ? 0.059 ns/op >> Multiple.storeStore avgt 3 6.757 ? 0.138 ns/op // Better >> >> MultipleWithLoads.plain avgt 3 11.849 ? 0.331 ns/op >> MultipleWithLoads.release avgt 3 35.565 ? 1.144 ns/op >> MultipleWithLoads.storeStore avgt 3 19.441 ? 0.471 ns/op // Better >> >> MultipleWithStores.plain avgt 3 5.920 ? 0.213 ns/op >> MultipleWithStores.release avgt 3 20.286 ? 0.347 ns/op >> MultipleWithStores.storeStore avgt 3 12.686 ? 0.230 ns/op // Better >> >> Publishing.plain avgt 3 22.261 ? 1.630 ns/op >> Publishing.release avgt 3 22.269 ? 0.576 ns/op >> Publishing.storeStore avgt 3 17.464 ? 0.397 ns/op // Better >> >> Single.plain avgt 3 5.916 ? 0.063 ns/op >> Single.release avgt 3 10.148 ? 0.401 ns/op >> Single.storeStore avgt 3 6.767 ? 0.164 ns/op // Better >> >> >> As expected, this does not affect x86_64 at all, because both `release` and `storeStore` are effectively no-ops, only affecting compiler optimizations: >> >> >> Benchmark Mode Cnt Score Error Units >> >> Multiple.plain avgt 3 0.406 ? 0.002 ns/op >> Multiple.release avgt 3 0.409 ? 0.018 ns/op >> Multiple.storeStore avgt 3 0.406 ? 0.001 ns/op >> >> MultipleWithLoads.plain avgt 3 4.328 ? 0.006 ns/op >> MultipleWithLoads.release avgt 3 4.600 ? 0.014 ns/op >> MultipleWithLoads.storeStore avgt 3 4.602 ? 0.006 ns/op >> >> MultipleWithStores.plain avgt 3 0.812 ? 0.001 ns/op >> MultipleWithStores.release avgt 3 0.812 ? 0.002 ns/op >> MultipleWithStores.storeStore avgt 3 0.812 ? 0.002 ns/op >> >> Publishing.plain avgt 3 6.370 ? 0.059 ns/op >> Publishing.release avgt 3 6.358 ? 0.436 ns/op >> Publishing.storeStore avgt 3 6.367 ? 0.054 ns/op >> >> Single.plain avgt 3 0.407 ? 0.039 ns/op >> Single.releaseFence avgt 3 0.406 ? 0.001 ns/op >> Single.storeStoreFence avgt 3 0.406 ? 0.001 ns/op >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix the comment to match JDK-8276096 Finally revived my quiet AArch64 dev board, added AArch64 results, which are even better than ARM32. Updated PR with perf results. ------------- PR: https://git.openjdk.java.net/jdk/pull/6136 From harold.seigel at oracle.com Mon Nov 1 17:57:55 2021 From: harold.seigel at oracle.com (Harold Seigel) Date: Mon, 1 Nov 2021 13:57:55 -0400 Subject: Incorrect hehavior on the class name (UTF8) in the constant pool of bytecode In-Reply-To: References: Message-ID: <975eac75-2454-4c90-507d-290384f3a5f5@oracle.com> Hi Cheng, Thank you for reporting this problem and providing a samle program.? I've created JBS bug https://bugs.openjdk.java.net/browse/JDK-8276241 for this problem.? You can follow progress of the issue by using that link. Thanks, Harold On 10/28/2021 8:25 PM, Cheng Jin wrote: > One or more of the following files ( dumped.class ) violates IBM policy and all attachment(s) have been removed from the message. > > ********************************************************************** > > > Hi There, > > I created a simple test that loads a class file as follows to see whether a > package name in the constant pool is rejected as invalid for a class name. > However, it surprised me that it just passed without any exception on > Hotspot (e.g. OpenJDK11). > > (See attached file: dumped.class) > > constant_pool (in dumped.class) > ... > 3. Utf8 > tag: 1 > length: 16 > bytes: die/verwandlung/ <----- a package name rather than an valid class > name > 4. Class > tag: 7 > name_index: 3 <------ > > > import java.io.*; > public class CustomClassLoader extends ClassLoader { > > @Override > public Class findClass(String fileName) throws ClassNotFoundException { > byte[] b = loadClassBytes(fileName); > return defineClass(null, b, 0, b.length); > } > > private byte[] loadClassBytes(String fileName) { > InputStream inputStream = > getClass().getClassLoader().getResourceAsStream(fileName + ".class"); > ByteArrayOutputStream byteOutStream = new ByteArrayOutputStream(); > try { > int nextByte = 0; > while ((nextByte = inputStream.read()) != -1) { > byteOutStream.write(nextByte); > } > } catch (IOException e) { > e.printStackTrace(); > } > return byteOutStream.toByteArray(); > } > > public static void main(String args[]) { > try { > CustomClassLoader cl = new CustomClassLoader(); > cl.findClass("dumped"); > System.out.println("DONE....."); > } catch (Exception e) { > e.printStackTrace(); > } > } > } > > $ jdk11_hotspot/bin/java CustomClassLoader > DONE..... > > > According to the VM Spec at 4.2.1 Binary Class and Interface Names > > Class and interface names that appear in class file structures are always > represented in a fully qualified form known as binary names (JLS ?13.1). > ...In this internal form, the ASCII periods (.) that normally separate the > identifiers which > make up the binary name are replaced by ASCII forward slashes (/). The > identifiers > themselves must be unqualified names (?4.2.2). > > For example, the normal binary name of class Thread is java.lang.Thread. In > the > internal form used in descriptors in the class file format, a reference to > the name of class > Thread is implemented using a CONSTANT_Utf8_info structure representing the > string > java/lang/Thread. > > It means a valid class name should be something like "xxx" or > "xxx/yyy/zzz" (where "/" only serves as the separator in between, and "/" > shouldn't occur at the end), > in which case "xxx/yyy/" is treated as invalid for a class name. > > > So I am wondering why Hotspot doesn't follow the VM Spec to check the > invalid package name in the constant pool. > > > Thanks and Best Regards > Cheng Jin From kvn at openjdk.java.net Mon Nov 1 18:48:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Nov 2021 18:48:10 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 15:35:36 GMT, Aleksey Shipilev wrote: >> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. >> >> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. >> >> There seem to be no performance regressions with this patch at least on Linux x86_64: >> >> >> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" >> >> Benchmark Mode Cnt Score Error Units >> >> ### Before >> >> StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms >> StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms >> StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms >> StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms >> >> >> StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms >> StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms >> StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms >> StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms >> >> >> StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms >> >> ### After >> >> StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms >> StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms >> StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms >> StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms >> >> StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms >> StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms >> StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms >> StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms >> >> StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms >> >> >> Additional testing: >> - [x] `StrictMath` benchmarks >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchups Removing intrinsics for StrictMatch `min/max` methods may prevent them from inlining if they are not hot when caller is compiled. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From shade at openjdk.java.net Mon Nov 1 19:00:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Nov 2021 19:00:13 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 18:44:53 GMT, Vladimir Kozlov wrote: > Removing intrinsics for StrictMatch `min/max` methods may prevent them from inlining if they are not hot when caller is compiled. Would you like me to leave them instead? That would mean we introduce these new intrinsic definitions: /* StrictMath intrinsics, similar to what we have in Math. */ \ do_intrinsic(_min_strict, java_lang_StrictMath, min_name, int2_int_signature, F_S) \ do_intrinsic(_max_strict, java_lang_StrictMath, max_name, int2_int_signature, F_S) \ do_intrinsic(_minF_strict, java_lang_StrictMath, min_name, float2_float_signature, F_S) \ do_intrinsic(_maxF_strict, java_lang_StrictMath, max_name, float2_float_signature, F_S) \ do_intrinsic(_minD_strict, java_lang_StrictMath, min_name, double2_double_signature, F_S) \ do_intrinsic(_maxD_strict, java_lang_StrictMath, max_name, double2_double_signature, F_S) \ /* Special flavor of dsqrt intrinsic to handle the "native" method in StrictMath. Otherwise the same as in Math. */ \ do_intrinsic(_dsqrt_strict, java_lang_StrictMath, sqrt_name, double_double_signature, F_SN) \ ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From mcimadamore at openjdk.java.net Mon Nov 1 22:36:40 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 1 Nov 2021 22:36:40 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Tweak javadoc of loaderLookup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/17f45861..7cf4fcd9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=10-11 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From kvn at openjdk.java.net Mon Nov 1 23:02:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 1 Nov 2021 23:02:10 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 15:35:36 GMT, Aleksey Shipilev wrote: >> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. >> >> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. >> >> There seem to be no performance regressions with this patch at least on Linux x86_64: >> >> >> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" >> >> Benchmark Mode Cnt Score Error Units >> >> ### Before >> >> StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms >> StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms >> StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms >> StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms >> >> >> StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms >> StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms >> StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms >> StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms >> >> >> StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms >> >> ### After >> >> StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms >> StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms >> StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms >> StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms >> >> StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms >> StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms >> StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms >> StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms >> >> StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms >> >> >> Additional testing: >> - [x] `StrictMath` benchmarks >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchups Yes, I am fine with new intrinsics for them. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From psandoz at openjdk.java.net Tue Nov 2 00:27:23 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 2 Nov 2021 00:27:23 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v10] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 12:05:32 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Add cache for memory address var handles > - Merge branch 'master' into JEP-419 > - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm) > - Merge branch 'master' into JEP-419 > - Fix copyright header in TestArrayCopy > - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!) > - * use `invokeWithArguments` to simplify new test > - Add test for liveness check with high-aririty downcalls > (make sure that if an exception occurs in a downcall because of liveness, > ref count of other resources are left intact). > - * Fix javadoc issue in VaList > * Fix bug in concurrent logic for shared scope acquire > - Address review comments > - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343 src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/Utils.java line 111: > 109: class VarHandleCache { > 110: private static final Map handleMap = new ConcurrentHashMap<>(); > 111: private static final Map handleMapNoAlignCheck = new ConcurrentHashMap<>(); Something to consider later if this is an issue. Since the number of `ValueLayout` instances is fixed, carrier x order = 18, we can use stable arrays with ordinals on the instances. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From duke at openjdk.java.net Tue Nov 2 02:31:15 2021 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 2 Nov 2021 02:31:15 GMT Subject: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2] In-Reply-To: References: <7IzrZdL0elgXbuisyLNYC2wkyOTe1RHUPuGRI7YsAQ4=.aed9dea3-4775-4592-b43e-c3e08e167f90@github.com> Message-ID: On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode Thank you for spotting the stale comment. It will removed in another related commit that will be pushed soon... ------------- PR: https://git.openjdk.java.net/jdk/pull/5933 From denghui.ddh at alibaba-inc.com Tue Nov 2 03:09:56 2021 From: denghui.ddh at alibaba-inc.com (Denghui Dong) Date: Tue, 02 Nov 2021 11:09:56 +0800 Subject: =?UTF-8?B?UmU6IFJGQzogRXh0ZW5kIERDbWQoRGlhZ25vc3RpYy1Db21tYW5kKSBmcmFtZXdvcmsgdG8g?= =?UTF-8?B?c3VwcG9ydCBKYXZhIGxldmVsIERDbWQ=?= In-Reply-To: <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com> References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com>, <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com> Message-ID: <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com> Hi Chris, Thank you for the comments. Yes, we have no good way to restrict the user registration commands to only include diagnosis-related operations, but in my opinion, this does not seem to be a problem that must be solved perfectly. The following are my thoughts. This extension is an entry that triggers the operation that the user wants to perform (similar to the Signal Handler mechanism but with a name and parameters). Even without this extension, the user can have other ways to achieve the same goal. On the one hand, we could standardize the usage scenarios of the API on the document(Indeed, users can still write programs not in accordance with the specifications, for example, users can implement multiple calls to the same object's hachCode method to return different values or make an object alive again during finalize method executing). On the other hand, we can add some restrictions to help users make better use of this extension. e.g we can add a new VM option, such as EnableUserLevelDCmd, the application can only register customer commands when this option is enabled. Or from another perspective, can we allow users to do some non-diagnostic-related operations in custom commands? Best, Denghui ------------------------------------------------------------------ From:Chris Plummer Send Time:2021?11?2?(???) 03:35 To:???(??) ; serviceability-dev ; hotspot-dev Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd I have similar concerns to those others have expressed, so I'll try to add something new to the discussion and not just repeat. DCMDs have historically been very VM centric. That's not to say they aren't useful for debugging applications, but they do so by providing VM related info like stack traces, heap dumps, and class histograms. Also hotspot has been the gatekeeper for new DCMDs, meaning that new ones do not get added without going through the hotspot review process. Allowing any application or framework to add a DCMD changes this VM centric view in a way that concerns me. This approach allows a DCMD to pretty much do anything (java security not withstanding). App writers could even use them to provide a user facing interface. For example, if an app has some sort internal database, it could allow users to query it via a DCMD, and maybe even suggest that users write simple shell scripts that use jcmd to do these queries. Allowing this type of non-diagnostic usage seems like a path we don't want to go down, yet I don't see how it can be prevented once you allow applications to add DCMDs. Chris On 10/25/21 1:37 AM, Denghui Dong wrote: Hi there! We'd like to discuss a proposal for extending the current DCmd framework to support Java level DCmd. At present, DCmd only allows the VM to register commands, which can be called through jcmd or JMX. It would be beneficial if the user could create their own commands. The idea of this extension originally came from our internal Java agent that detects the misusage of Unsafe API. This agent can collect the call sites that allocate or free direct memory in the application(NMT could not do it IMO) to detect direct memory leaks. In the beginning, it just prints all call sites, without any statistical function, it's hard to use. So we plan to use a way similar to jeprof (from jemalloc) to generate a report file that aggregates all useful information. During the implementation process, we found that we need a mechanism to notify the agent to generate reports. The common practice is: a) Register a service port, triggered by an HTTP request b) Triggered by signal c) Generate reports periodically, or when the process exits But these three ways have certain problems. For a) we need to introduce a network component, will increase the complexity of implementation For b) we cannot pass parameters For c) some files that may never be used will be generated Essentially, this question is how to notify the application to do a certain task, or in other words, how do we issue a command to the application. We believe that other Java developers will also encounter similar problems. (And sometimes there may be multiple unrelated dependent components in a Java application that require such a mechanism.) Naturally, we think that jcmd can already issue some commands registered in VM to the application, why can't we extend to the java level? This feature will be very useful for some lightweight tools, just like the scenario we encountered, to notify the tools to perform certain operations. In addition, this feature will also bring benefits to Java beginners. For example, in the beginning, beginners may not use advanced log components, but they will also encounter the need to output debug logs. They may write code like this: ``` if (debug) { System.out.println("..."); } ``` If developers can easily control the value of debug, it's attractive. Like this: ``` Factory.register("MyApp.flipDebug", out -> debug = !debug); jcmd MyApp.flipDebug ``` For mainstream framework, we can apply this feature to trigger some common activities, such as health checks, graceful shutdown, and dynamic configuration updates, But to be honest, these frameworks are very mature and stable, and for compatibility purposes, it's hard to let them use this extension. Comments welcome! Thanks, Denghui From denghui.ddh at alibaba-inc.com Tue Nov 2 03:20:47 2021 From: denghui.ddh at alibaba-inc.com (Denghui Dong) Date: Tue, 02 Nov 2021 11:20:47 +0800 Subject: =?UTF-8?B?UmU6IFJGQzogRXh0ZW5kIERDbWQoRGlhZ25vc3RpYy1Db21tYW5kKSBmcmFtZXdvcmsgdG8g?= =?UTF-8?B?c3VwcG9ydCBKYXZhIGxldmVsIERDbWQ=?= In-Reply-To: <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com> References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com>, <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com>, <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com> Message-ID: <967efbed-b345-462a-943c-c171b410cc21.denghui.ddh@alibaba-inc.com> By the way, Erik mentioned that the DCmd command in JFR is unlikely to use this extension. But there are some other VM commands I think can be easily replaced with this extension, such as RunFinalizationDCmd, FinalizerInfoDCmd, PrintSystemPropertiesDCmd, JMX-related DCmds, etc. Denghui ------------------------------------------------------------------ From:???(??) Send Time:2021?11?2?(???) 11:09 To:serviceability-dev ; hotspot-dev ; Chris Plummer Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd Hi Chris, Thank you for the comments. Yes, we have no good way to restrict the user registration commands to only include diagnosis-related operations, but in my opinion, this does not seem to be a problem that must be solved perfectly. The following are my thoughts. This extension is an entry that triggers the operation that the user wants to perform (similar to the Signal Handler mechanism but with a name and parameters). Even without this extension, the user can have other ways to achieve the same goal. On the one hand, we could standardize the usage scenarios of the API on the document(Indeed, users can still write programs not in accordance with the specifications, for example, users can implement multiple calls to the same object's hachCode method to return different values or make an object alive again during finalize method executing). On the other hand, we can add some restrictions to help users make better use of this extension. e.g we can add a new VM option, such as EnableUserLevelDCmd, the application can only register customer commands when this option is enabled. Or from another perspective, can we allow users to do some non-diagnostic-related operations in custom commands? Best, Denghui ------------------------------------------------------------------ From:Chris Plummer Send Time:2021?11?2?(???) 03:35 To:???(??) ; serviceability-dev ; hotspot-dev Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd I have similar concerns to those others have expressed, so I'll try to add something new to the discussion and not just repeat. DCMDs have historically been very VM centric. That's not to say they aren't useful for debugging applications, but they do so by providing VM related info like stack traces, heap dumps, and class histograms. Also hotspot has been the gatekeeper for new DCMDs, meaning that new ones do not get added without going through the hotspot review process. Allowing any application or framework to add a DCMD changes this VM centric view in a way that concerns me. This approach allows a DCMD to pretty much do anything (java security not withstanding). App writers could even use them to provide a user facing interface. For example, if an app has some sort internal database, it could allow users to query it via a DCMD, and maybe even suggest that users write simple shell scripts that use jcmd to do these queries. Allowing this type of non-diagnostic usage seems like a path we don't want to go down, yet I don't see how it can be prevented once you allow applications to add DCMDs. Chris On 10/25/21 1:37 AM, Denghui Dong wrote: Hi there! We'd like to discuss a proposal for extending the current DCmd framework to support Java level DCmd. At present, DCmd only allows the VM to register commands, which can be called through jcmd or JMX. It would be beneficial if the user could create their own commands. The idea of this extension originally came from our internal Java agent that detects the misusage of Unsafe API. This agent can collect the call sites that allocate or free direct memory in the application(NMT could not do it IMO) to detect direct memory leaks. In the beginning, it just prints all call sites, without any statistical function, it's hard to use. So we plan to use a way similar to jeprof (from jemalloc) to generate a report file that aggregates all useful information. During the implementation process, we found that we need a mechanism to notify the agent to generate reports. The common practice is: a) Register a service port, triggered by an HTTP request b) Triggered by signal c) Generate reports periodically, or when the process exits But these three ways have certain problems. For a) we need to introduce a network component, will increase the complexity of implementation For b) we cannot pass parameters For c) some files that may never be used will be generated Essentially, this question is how to notify the application to do a certain task, or in other words, how do we issue a command to the application. We believe that other Java developers will also encounter similar problems. (And sometimes there may be multiple unrelated dependent components in a Java application that require such a mechanism.) Naturally, we think that jcmd can already issue some commands registered in VM to the application, why can't we extend to the java level? This feature will be very useful for some lightweight tools, just like the scenario we encountered, to notify the tools to perform certain operations. In addition, this feature will also bring benefits to Java beginners. For example, in the beginning, beginners may not use advanced log components, but they will also encounter the need to output debug logs. They may write code like this: ``` if (debug) { System.out.println("..."); } ``` If developers can easily control the value of debug, it's attractive. Like this: ``` Factory.register("MyApp.flipDebug", out -> debug = !debug); jcmd MyApp.flipDebug ``` For mainstream framework, we can apply this feature to trigger some common activities, such as health checks, graceful shutdown, and dynamic configuration updates, But to be honest, these frameworks are very mature and stable, and for compatibility purposes, it's hard to let them use this extension. Comments welcome! Thanks, Denghui From shade at openjdk.java.net Tue Nov 2 06:25:33 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Nov 2021 06:25:33 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3] In-Reply-To: References: Message-ID: > This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. > > For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. > > There seem to be no performance regressions with this patch at least on Linux x86_64: > > > $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" > > Benchmark Mode Cnt Score Error Units > > ### Before > > StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms > StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms > StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms > StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms > > > StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms > StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms > StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms > StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms > > > StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms > > ### After > > StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms > StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms > StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms > StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms > > StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms > StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms > StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms > StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms > > StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms > > > Additional testing: > - [x] `StrictMath` benchmarks > - [x] Linux x86_64 fastdebug `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Keep intrinsics on StrictMath ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6184/files - new: https://git.openjdk.java.net/jdk/pull/6184/files/27202fa4..005cace6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6184&range=01-02 Stats: 67 lines in 5 files changed: 55 ins; 5 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6184.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6184/head:pull/6184 PR: https://git.openjdk.java.net/jdk/pull/6184 From shade at openjdk.java.net Tue Nov 2 06:25:33 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Nov 2021 06:25:33 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v2] In-Reply-To: References: Message-ID: <68cTNOLxxPPW5cJFydBuPv56t_UUdGQi-F0yTT9x2zE=.55f3ab68-ab5c-41f6-8b0d-0c29e4c680b1@github.com> On Mon, 1 Nov 2021 22:59:10 GMT, Vladimir Kozlov wrote: > Yes, I am fine with new intrinsics for them. All right, see new commit then. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From shade at openjdk.java.net Tue Nov 2 10:29:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Nov 2021 10:29:16 GMT Subject: RFR: 8252990: Intrinsify Unsafe.storeStoreFence [v2] In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:58:48 GMT, Aleksey Shipilev wrote: >> `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. We can teach compilers to map this directly to already existing rules that handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce the special node to differentiate explicit fence and implicit store-store barriers. `storeStoreFence` is usually used to simulate safe `final`-field like constructions in special JDK classes, like `ConstantCallSite` and friends. >> >> Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4): >> >> >> Benchmark Mode Cnt Score Error Units >> Multiple.plain avgt 3 2.669 ? 0.004 ns/op >> Multiple.release avgt 3 16.688 ? 0.057 ns/op >> Multiple.storeStore avgt 3 14.021 ? 0.144 ns/op // Better >> >> MultipleWithLoads.plain avgt 3 4.672 ? 0.053 ns/op >> MultipleWithLoads.release avgt 3 16.689 ? 0.044 ns/op >> MultipleWithLoads.storeStore avgt 3 14.012 ? 0.010 ns/op // Better >> >> MultipleWithStores.plain avgt 3 14.687 ? 0.009 ns/op >> MultipleWithStores.release avgt 3 45.393 ? 0.192 ns/op >> MultipleWithStores.storeStore avgt 3 38.048 ? 0.033 ns/op // Better >> >> Publishing.plain avgt 3 27.079 ? 0.201 ns/op >> Publishing.release avgt 3 27.088 ? 0.241 ns/op >> Publishing.storeStore avgt 3 27.009 ? 0.259 ns/op // Within error, hidden by allocation >> >> Single.plain avgt 3 2.670 ? 0.002 ns/op >> Single.releaseFence avgt 3 6.675 ? 0.001 ns/op >> Single.storeStoreFence avgt 3 8.012 ? 0.027 ns/op // Worse, seems to be ARM32 implementation artifact >> >> >> The same thing on AArch64 (Raspberry Pi 3): >> >> >> Benchmark Mode Cnt Score Error Units >> >> Multiple.plain avgt 3 5.914 ? 0.115 ns/op >> Multiple.release avgt 3 10.149 ? 0.059 ns/op >> Multiple.storeStore avgt 3 6.757 ? 0.138 ns/op // Better >> >> MultipleWithLoads.plain avgt 3 11.849 ? 0.331 ns/op >> MultipleWithLoads.release avgt 3 35.565 ? 1.144 ns/op >> MultipleWithLoads.storeStore avgt 3 19.441 ? 0.471 ns/op // Better >> >> MultipleWithStores.plain avgt 3 5.920 ? 0.213 ns/op >> MultipleWithStores.release avgt 3 20.286 ? 0.347 ns/op >> MultipleWithStores.storeStore avgt 3 12.686 ? 0.230 ns/op // Better >> >> Publishing.plain avgt 3 22.261 ? 1.630 ns/op >> Publishing.release avgt 3 22.269 ? 0.576 ns/op >> Publishing.storeStore avgt 3 17.464 ? 0.397 ns/op // Better >> >> Single.plain avgt 3 5.916 ? 0.063 ns/op >> Single.release avgt 3 10.148 ? 0.401 ns/op >> Single.storeStore avgt 3 6.767 ? 0.164 ns/op // Better >> >> >> As expected, this does not affect x86_64 at all, because both `release` and `storeStore` are effectively no-ops, only affecting compiler optimizations: >> >> >> Benchmark Mode Cnt Score Error Units >> >> Multiple.plain avgt 3 0.406 ? 0.002 ns/op >> Multiple.release avgt 3 0.409 ? 0.018 ns/op >> Multiple.storeStore avgt 3 0.406 ? 0.001 ns/op >> >> MultipleWithLoads.plain avgt 3 4.328 ? 0.006 ns/op >> MultipleWithLoads.release avgt 3 4.600 ? 0.014 ns/op >> MultipleWithLoads.storeStore avgt 3 4.602 ? 0.006 ns/op >> >> MultipleWithStores.plain avgt 3 0.812 ? 0.001 ns/op >> MultipleWithStores.release avgt 3 0.812 ? 0.002 ns/op >> MultipleWithStores.storeStore avgt 3 0.812 ? 0.002 ns/op >> >> Publishing.plain avgt 3 6.370 ? 0.059 ns/op >> Publishing.release avgt 3 6.358 ? 0.436 ns/op >> Publishing.storeStore avgt 3 6.367 ? 0.054 ns/op >> >> Single.plain avgt 3 0.407 ? 0.039 ns/op >> Single.releaseFence avgt 3 0.406 ? 0.001 ns/op >> Single.storeStoreFence avgt 3 0.406 ? 0.001 ns/op >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] Linux AArch64 fastdebug `tier1` >> - [x] Linux x86_64 Fences benchmark >> - [x] Linux AArch64 Fences benchmark >> - [x] Linux ARM32 Fences benchmark >> - [x] Linux AArch64 jcstress `quick` run > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix the comment to match JDK-8276096 jcstress and tier1 passes on AArch64. Seems like we are good to go. ------------- PR: https://git.openjdk.java.net/jdk/pull/6136 From shade at openjdk.java.net Tue Nov 2 10:29:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 2 Nov 2021 10:29:17 GMT Subject: Integrated: 8252990: Intrinsify Unsafe.storeStoreFence In-Reply-To: References: Message-ID: On Wed, 27 Oct 2021 11:53:47 GMT, Aleksey Shipilev wrote: > `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. We can teach compilers to map this directly to already existing rules that handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we introduce the special node to differentiate explicit fence and implicit store-store barriers. `storeStoreFence` is usually used to simulate safe `final`-field like constructions in special JDK classes, like `ConstantCallSite` and friends. > > Motivational performance difference on benchmarks from JDK-8276054 on ARM32 (Raspberry Pi 4): > > > Benchmark Mode Cnt Score Error Units > Multiple.plain avgt 3 2.669 ? 0.004 ns/op > Multiple.release avgt 3 16.688 ? 0.057 ns/op > Multiple.storeStore avgt 3 14.021 ? 0.144 ns/op // Better > > MultipleWithLoads.plain avgt 3 4.672 ? 0.053 ns/op > MultipleWithLoads.release avgt 3 16.689 ? 0.044 ns/op > MultipleWithLoads.storeStore avgt 3 14.012 ? 0.010 ns/op // Better > > MultipleWithStores.plain avgt 3 14.687 ? 0.009 ns/op > MultipleWithStores.release avgt 3 45.393 ? 0.192 ns/op > MultipleWithStores.storeStore avgt 3 38.048 ? 0.033 ns/op // Better > > Publishing.plain avgt 3 27.079 ? 0.201 ns/op > Publishing.release avgt 3 27.088 ? 0.241 ns/op > Publishing.storeStore avgt 3 27.009 ? 0.259 ns/op // Within error, hidden by allocation > > Single.plain avgt 3 2.670 ? 0.002 ns/op > Single.releaseFence avgt 3 6.675 ? 0.001 ns/op > Single.storeStoreFence avgt 3 8.012 ? 0.027 ns/op // Worse, seems to be ARM32 implementation artifact > > > The same thing on AArch64 (Raspberry Pi 3): > > > Benchmark Mode Cnt Score Error Units > > Multiple.plain avgt 3 5.914 ? 0.115 ns/op > Multiple.release avgt 3 10.149 ? 0.059 ns/op > Multiple.storeStore avgt 3 6.757 ? 0.138 ns/op // Better > > MultipleWithLoads.plain avgt 3 11.849 ? 0.331 ns/op > MultipleWithLoads.release avgt 3 35.565 ? 1.144 ns/op > MultipleWithLoads.storeStore avgt 3 19.441 ? 0.471 ns/op // Better > > MultipleWithStores.plain avgt 3 5.920 ? 0.213 ns/op > MultipleWithStores.release avgt 3 20.286 ? 0.347 ns/op > MultipleWithStores.storeStore avgt 3 12.686 ? 0.230 ns/op // Better > > Publishing.plain avgt 3 22.261 ? 1.630 ns/op > Publishing.release avgt 3 22.269 ? 0.576 ns/op > Publishing.storeStore avgt 3 17.464 ? 0.397 ns/op // Better > > Single.plain avgt 3 5.916 ? 0.063 ns/op > Single.release avgt 3 10.148 ? 0.401 ns/op > Single.storeStore avgt 3 6.767 ? 0.164 ns/op // Better > > > As expected, this does not affect x86_64 at all, because both `release` and `storeStore` are effectively no-ops, only affecting compiler optimizations: > > > Benchmark Mode Cnt Score Error Units > > Multiple.plain avgt 3 0.406 ? 0.002 ns/op > Multiple.release avgt 3 0.409 ? 0.018 ns/op > Multiple.storeStore avgt 3 0.406 ? 0.001 ns/op > > MultipleWithLoads.plain avgt 3 4.328 ? 0.006 ns/op > MultipleWithLoads.release avgt 3 4.600 ? 0.014 ns/op > MultipleWithLoads.storeStore avgt 3 4.602 ? 0.006 ns/op > > MultipleWithStores.plain avgt 3 0.812 ? 0.001 ns/op > MultipleWithStores.release avgt 3 0.812 ? 0.002 ns/op > MultipleWithStores.storeStore avgt 3 0.812 ? 0.002 ns/op > > Publishing.plain avgt 3 6.370 ? 0.059 ns/op > Publishing.release avgt 3 6.358 ? 0.436 ns/op > Publishing.storeStore avgt 3 6.367 ? 0.054 ns/op > > Single.plain avgt 3 0.407 ? 0.039 ns/op > Single.releaseFence avgt 3 0.406 ? 0.001 ns/op > Single.storeStoreFence avgt 3 0.406 ? 0.001 ns/op > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux AArch64 fastdebug `tier1` > - [x] Linux x86_64 Fences benchmark > - [x] Linux AArch64 Fences benchmark > - [x] Linux ARM32 Fences benchmark > - [x] Linux AArch64 jcstress `quick` run This pull request has now been integrated. Changeset: b7a06be9 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/b7a06be98d3057dac4adbb7f4071ac62cf88fe52 Stats: 38 lines in 16 files changed: 32 ins; 5 del; 1 mod 8252990: Intrinsify Unsafe.storeStoreFence Reviewed-by: dholmes, thartmann, whuang ------------- PR: https://git.openjdk.java.net/jdk/pull/6136 From mcimadamore at openjdk.java.net Tue Nov 2 10:34:18 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 2 Nov 2021 10:34:18 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v10] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 00:24:12 GMT, Paul Sandoz wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Add cache for memory address var handles >> - Merge branch 'master' into JEP-419 >> - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm) >> - Merge branch 'master' into JEP-419 >> - Fix copyright header in TestArrayCopy >> - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!) >> - * use `invokeWithArguments` to simplify new test >> - Add test for liveness check with high-aririty downcalls >> (make sure that if an exception occurs in a downcall because of liveness, >> ref count of other resources are left intact). >> - * Fix javadoc issue in VaList >> * Fix bug in concurrent logic for shared scope acquire >> - Address review comments >> - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343 > > src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/Utils.java line 111: > >> 109: class VarHandleCache { >> 110: private static final Map handleMap = new ConcurrentHashMap<>(); >> 111: private static final Map handleMapNoAlignCheck = new ConcurrentHashMap<>(); > > Something to consider later if this is an issue. Since the number of `ValueLayout` instances is fixed, carrier x order = 18, we can use stable arrays with ordinals on the instances. What about alignment? ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From psandoz at openjdk.java.net Tue Nov 2 15:44:12 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 2 Nov 2021 15:44:12 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 22:36:40 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Tweak javadoc of loaderLookup Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From psandoz at openjdk.java.net Tue Nov 2 15:44:13 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Tue, 2 Nov 2021 15:44:13 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v10] In-Reply-To: References: Message-ID: <5onID0SnzIoPH5_Le4f71eC5ll_zGn0DQecQVpL1jDM=.43d7b2af-185a-4251-828f-058da6a69115@github.com> On Tue, 2 Nov 2021 10:30:42 GMT, Maurizio Cimadamore wrote: >> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/Utils.java line 111: >> >>> 109: class VarHandleCache { >>> 110: private static final Map handleMap = new ConcurrentHashMap<>(); >>> 111: private static final Map handleMapNoAlignCheck = new ConcurrentHashMap<>(); >> >> Something to consider later if this is an issue. Since the number of `ValueLayout` instances is fixed, carrier x order = 18, we can use stable arrays with ordinals on the instances. > > What about alignment? Drat, `skipAlignmentCheck` misled me but perhaps there is still benefit for common constants with 8 bit and size alignment and fallback otherwise. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From kvn at openjdk.java.net Tue Nov 2 17:11:09 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Nov 2021 17:11:09 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev wrote: >> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. >> >> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. >> >> There seem to be no performance regressions with this patch at least on Linux x86_64: >> >> >> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" >> >> Benchmark Mode Cnt Score Error Units >> >> ### Before >> >> StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms >> StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms >> StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms >> StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms >> >> >> StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms >> StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms >> StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms >> StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms >> >> >> StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms >> >> ### After >> >> StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms >> StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms >> StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms >> StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms >> >> StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms >> StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms >> StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms >> StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms >> >> StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms >> >> >> Additional testing: >> - [x] `StrictMath` benchmarks >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Keep intrinsics on StrictMath Good. Thank you for fixing it. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6184 From jvernee at openjdk.java.net Tue Nov 2 17:32:24 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 2 Nov 2021 17:32:24 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v10] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 12:05:32 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Add cache for memory address var handles > - Merge branch 'master' into JEP-419 > - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm) > - Merge branch 'master' into JEP-419 > - Fix copyright header in TestArrayCopy > - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!) > - * use `invokeWithArguments` to simplify new test > - Add test for liveness check with high-aririty downcalls > (make sure that if an exception occurs in a downcall because of liveness, > ref count of other resources are left intact). > - * Fix javadoc issue in VaList > * Fix bug in concurrent logic for shared scope acquire > - Address review comments > - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343 src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1586: > 1584: public void ensureCustomized(MethodHandle mh) { > 1585: mh.customize(); > 1586: } This is no longer needed, but it probably got picked up in the merge. src/java.base/share/classes/jdk/internal/access/JavaLangInvokeAccess.java line 144: > 142: * @param mh the method handle > 143: */ > 144: void ensureCustomized(MethodHandle mh); Same here, no longer needed. (it was used by now removed upcall handler code. See https://github.com/openjdk/panama-foreign/pull/553) src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemoryAddress.java line 107: > 105: * > 106: * @param offset offset in bytes (relative to this address). The final address of this read operation can be expressed as {@code toRowLongValue() + offset}. > 107: * @return a Java UTF-8 string containing all the bytes read from the given starting address ({@code toRowLongValue() + offset}) (see also comment on MemorySegment.getUtf8String) Suggestion: * @return a Java string constructed from the bytes read from the given starting address ({@code toRowLongValue() + offset}) src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 387: > 385: > 386: /** > 387: * Performs an element-wise bulk copy from given source segment to this segment. More specifically, the bytes at Suggestion: * Performs a byte-wise bulk copy from given source segment to this segment. More specifically, the bytes at src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 400: > 398: * a multiple of the source element layout size, if the source segment is incompatible with the alignment constraints > 399: * in the source element layout, or if this segment is incompatible with the alignment constraints > 400: * in the destination element layout. This speaks about element layouts, but I don't see any element layouts in the method implementation. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 633: > 631: * java.nio.charset.CharsetDecoder} class should be used when more control > 632: * over the decoding process is required. > 633: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment, Suggestion: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment, src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 636: > 634: * the final address of this read operation can be expressed as {@code address().toRowLongValue() + offset}. > 635: * @return a Java UTF-8 string containing all the bytes read from the given starting address up to (but not including) > 636: * the first {@code '\0'} terminator character (assuming one is found). The phrase "a Java UTF-8 string" sounds strange to me, as Java Strings are not encoded in UTF-8. The string that is read is UTF-8 encoded, but then it is converted from UTF-8 to Java internal String encoding (UTF-16 or Latin1). I'd suggest just dropping the 'UTF-8', and changing 'containing all' to 'constructed from'. Suggestion: * @return a Java string constructed from the bytes read from the given starting address up to (but not including) * the first {@code '\0'} terminator character (assuming one is found). src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 652: > 650: * java.nio.charset.CharsetDecoder} class should be used when more control > 651: * over the decoding process is required. > 652: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment, Suggestion: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment, src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 762: > 760: > 761: /** > 762: * Creates a new native memory segment with given size and resource scope, and whose base address is this address. Suggestion: * Creates a new native memory segment with given size and resource scope, and whose base address is the given address. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 769: > 767: * provided resource scope. > 768: *

> 769: * Clients should ensure that the address and bounds refers to a valid region of memory that is accessible for reading and, Suggestion: * Clients should ensure that the address and bounds refer to a valid region of memory that is accessible for reading and, src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1035: > 1033: * > 1034: * @param layout the layout of the memory region to be read. > 1035: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment, Suggestion: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment, src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1549: > 1547: * @param index index (relative to this segment). For instance, if this segment is a {@link #isNative()} segment, > 1548: * the final address of this write operation can be expressed as {@code address().toRowLongValue() + (index * layout.byteSize())}. > 1549: * @param value the byte value to be written. Suggestion: * @param value the address value to be written. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1563: > 1561: * Copies a number of elements from a source segment to a destination array, > 1562: * starting at a given segment offset (expressed in bytes), and a given array index, using the given source element layout. > 1563: * Supported array types are {@code byte[]}, {@code char[]},{@code short[]},{@code int[]},{@code float[]},{@code long[]} and {@code double[]}. Suggestion: * Supported array types are {@code byte[]}, {@code char[]}, {@code short[]}, {@code int[]}, {@code float[]}, {@code long[]} and {@code double[]}. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1604: > 1602: * Copies a number of elements from a source array to a destination segment, > 1603: * starting at a given array index, and a given segment offset (expressed in bytes), using the given destination element layout. > 1604: * Supported array types are {@code byte[]}, {@code char[]},{@code short[]},{@code int[]},{@code float[]},{@code long[]} and {@code double[]}. Suggestion: * Supported array types are {@code byte[]}, {@code char[]}, {@code short[]}, {@code int[]}, {@code float[]}, {@code long[]} and {@code double[]}. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/ResourceScope.java line 208: > 206: */ > 207: static ResourceScope newConfinedScope() { > 208: return ResourceScopeImpl.createConfined( Thread.currentThread(), null); Suggestion: return ResourceScopeImpl.createConfined(Thread.currentThread(), null); src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/VaList.java line 132: > 130: /** > 131: * Copies this variable argument list at its current position into a new variable argument list associated > 132: * with the same scope as this variable argument list. using the segment provided allocator. Copying is useful to I think ". using the segment provided allocator" can be removed. Seems like a leftover from when we had an overload that took an allocator. Suggestion: * with the same scope as this variable argument list. Copying is useful to ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From jvernee at openjdk.java.net Tue Nov 2 17:32:17 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 2 Nov 2021 17:32:17 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 22:36:40 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Tweak javadoc of loaderLookup Mostly some minor javadoc comments. src/java.base/share/classes/java/lang/Module.java line 32: > 30: import java.lang.annotation.Annotation; > 31: import java.lang.invoke.MethodHandle; > 32: import java.lang.invoke.VarHandle; These imports seem spurious now. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/ValueLayout.java line 177: > 175: } > 176: if (carrier.isPrimitive() && Wrapper.forPrimitiveType(carrier).bitWidth() != size && > 177: carrier != boolean.class && size != 8) { I find this condition hard to parse, I'd suggest re-writing it as: if (carrier.isPrimitive()) { long expectedSize = carrier == boolean.class ? 8 : Wrapper.forPrimitiveType(carrier).bitWidth(); if (size != expectedSize) { throw ... } } (Maybe even change the `if` to an `else` and combine it with the above if). src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/ValueLayout.java line 484: > 482: public static final class OfAddress extends ValueLayout { > 483: OfAddress(ByteOrder order) { > 484: super(MemoryAddress.class, order, Unsafe.ADDRESS_SIZE * 8); I see `Unsafe.ADDRESS_SIZE` used in several places, suggest to maybe add an `ADDRESS_SIZE_BITS` constants somewhere (it's a bit more readable). src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 42: > 40: final long blockSize; > 41: final long arenaSize; > 42: final ResourceScope scope; Could these field be made private? src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 88: > 86: if (size > arenaSize) { > 87: throw new OutOfMemoryError(); > 88: } Isn't this already covered by the `finally` block? Also, this seems to be checking the unaltered `size`, which I think should have been already done at the end of the previous `allocate` call right? src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java line 122: > 120: ResourceScopeImpl targetImpl = (ResourceScopeImpl)target; > 121: targetImpl.acquire0(); > 122: addCloseAction(targetImpl::release0); Maybe this should explicitly check if target is `null` (though the call to `acquire0` would also produce an NPE, the stack trace having Objects::requireNonNull in there would make the error more obvious I think). Suggestion: public void keepAlive(ResourceScope target) { Objects.requireNonNull(target); if (target == this) { throw new IllegalArgumentException("Invalid target scope."); } ResourceScopeImpl targetImpl = (ResourceScopeImpl)target; targetImpl.acquire0(); addCloseAction(targetImpl::release0); src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/SharedScope.java line 101: > 99: int value; > 100: do { > 101: value = (int) STATE.getVolatile(jdk.internal.foreign.SharedScope.this); Doesn't need to be fully qualified I think? Suggestion: value = (int) STATE.getVolatile(this); src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/SharedScope.java line 106: > 104: throw new IllegalStateException("Already closed"); > 105: } > 106: } while (!STATE.compareAndSet(jdk.internal.foreign.SharedScope.this, value, value - 1)); Same here Suggestion: } while (!STATE.compareAndSet(this, value, value - 1)); ------------- Marked as reviewed by jvernee (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5907 From jvernee at openjdk.java.net Tue Nov 2 17:32:25 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 2 Nov 2021 17:32:25 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v10] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 15:38:18 GMT, Jorn Vernee wrote: >> Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Add cache for memory address var handles >> - Merge branch 'master' into JEP-419 >> - Fix regression in VaList treatment on AArch64 (contributed by @nick-arm) >> - Merge branch 'master' into JEP-419 >> - Fix copyright header in TestArrayCopy >> - Fix failing microbenchmarks. Contributed by @FrauBoes (thanks!) >> - * use `invokeWithArguments` to simplify new test >> - Add test for liveness check with high-aririty downcalls >> (make sure that if an exception occurs in a downcall because of liveness, >> ref count of other resources are left intact). >> - * Fix javadoc issue in VaList >> * Fix bug in concurrent logic for shared scope acquire >> - Address review comments >> - ... and 7 more: https://git.openjdk.java.net/jdk/compare/5bb1992b...9b519343 > > src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1035: > >> 1033: * >> 1034: * @param layout the layout of the memory region to be read. >> 1035: * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative()} segment, > > Suggestion: > > * @param offset offset in bytes (relative to this segment). For instance, if this segment is a {@link #isNative() native} segment, Same suggestion with all the other getters/setters below (I assume you wanted to add text to the link here?) > src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/MemorySegment.java line 1549: > >> 1547: * @param index index (relative to this segment). For instance, if this segment is a {@link #isNative()} segment, >> 1548: * the final address of this write operation can be expressed as {@code address().toRowLongValue() + (index * layout.byteSize())}. >> 1549: * @param value the byte value to be written. > > Suggestion: > > * @param value the address value to be written. I think all the setters have this problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Tue Nov 2 18:52:21 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 2 Nov 2021 18:52:21 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 16:51:06 GMT, Jorn Vernee wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak javadoc of loaderLookup > > src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 88: > >> 86: if (size > arenaSize) { >> 87: throw new OutOfMemoryError(); >> 88: } > > Isn't this already covered by the `finally` block? Also, this seems to be checking the unaltered `size`, which I think should have been already done at the end of the previous `allocate` call right? I'll have to think some more about this. I don't think this is covered inside the block - that is, the block tries to allocate, and then in the finally we throw if we realized we've allocated too much. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From alanb at openjdk.java.net Tue Nov 2 19:49:15 2021 From: alanb at openjdk.java.net (Alan Bateman) Date: Tue, 2 Nov 2021 19:49:15 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v13] In-Reply-To: <1DhHETKpULKzqGU-0EU7qcdSWDngTBO1UMQ39E8qzBw=.ad279b49-57fb-4026-9049-862b4aef2ada@github.com> References: <1DhHETKpULKzqGU-0EU7qcdSWDngTBO1UMQ39E8qzBw=.ad279b49-57fb-4026-9049-862b4aef2ada@github.com> Message-ID: On Tue, 2 Nov 2021 19:35:29 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: > > - Address impl review comments > - Address API review comments src/java.base/share/classes/java/lang/Module.java line 114: > 112: > 113: // true, if this module allows restricted native access; @Stable makes sure that modules that allow native > 114: // access capture this property as a constant. Do you mind fixing this comment to avoid the really long line, it sticks out compare to everything else around it. src/java.base/share/classes/sun/nio/ch/IOUtil.java line 478: > 476: private static final JavaNioAccess NIO_ACCESS = SharedSecrets.getJavaNioAccess(); > 477: > 478: static Runnable acquireScope(ByteBuffer bb, boolean async) { At some point (not this PR) we should move the "async" out of this file, IOUtil was for synchronous I/O. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Tue Nov 2 19:49:14 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 2 Nov 2021 19:49:14 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v13] In-Reply-To: References: Message-ID: <1DhHETKpULKzqGU-0EU7qcdSWDngTBO1UMQ39E8qzBw=.ad279b49-57fb-4026-9049-862b4aef2ada@github.com> > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with two additional commits since the last revision: - Address impl review comments - Address API review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/7cf4fcd9..1126133a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=11-12 Stats: 103 lines in 11 files changed: 8 ins; 23 del; 72 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Tue Nov 2 19:49:16 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 2 Nov 2021 19:49:16 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 18:48:57 GMT, Maurizio Cimadamore wrote: >> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ArenaAllocator.java line 88: >> >>> 86: if (size > arenaSize) { >>> 87: throw new OutOfMemoryError(); >>> 88: } >> >> Isn't this already covered by the `finally` block? Also, this seems to be checking the unaltered `size`, which I think should have been already done at the end of the previous `allocate` call right? > > I'll have to think some more about this. I don't think this is covered inside the block - that is, the block tries to allocate, and then in the finally we throw if we realized we've allocated too much. What is missing, I think, is a check (size > arenaSize) at the beginning of the method (we only check this in one of the paths). But we need to check before and after, I think, as it is possible to allocate a segment and then realize that we ended up overflowing the arena size. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Tue Nov 2 19:49:16 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 2 Nov 2021 19:49:16 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 18:55:47 GMT, Maurizio Cimadamore wrote: >> I'll have to think some more about this. I don't think this is covered inside the block - that is, the block tries to allocate, and then in the finally we throw if we realized we've allocated too much. > > What is missing, I think, is a check (size > arenaSize) at the beginning of the method (we only check this in one of the paths). But we need to check before and after, I think, as it is possible to allocate a segment and then realize that we ended up overflowing the arena size. While what I said above correctly reflects what the implementation does, I think a broader issue is that the arena allocator implementation is allocating sometimes more native memory than what its contract specifies. While in some cases we can prevent that, I think in the general case (e.g. where we allocate a new block) we cannot, unless we add extra API guarantees - e.g. that the arena size should be a multiple of the block size (but then we'd have to special case `Long.MAX_VALUE`, or maybe pick a "big enough" power of two instead) ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From jvernee at openjdk.java.net Tue Nov 2 19:49:16 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Tue, 2 Nov 2021 19:49:16 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 19:02:51 GMT, Maurizio Cimadamore wrote: >> What is missing, I think, is a check (size > arenaSize) at the beginning of the method (we only check this in one of the paths). But we need to check before and after, I think, as it is possible to allocate a segment and then realize that we ended up overflowing the arena size. > > While what I said above correctly reflects what the implementation does, I think a broader issue is that the arena allocator implementation is allocating sometimes more native memory than what its contract specifies. While in some cases we can prevent that, I think in the general case (e.g. where we allocate a new block) we cannot, unless we add extra API guarantees - e.g. that the arena size should be a multiple of the block size (but then we'd have to special case `Long.MAX_VALUE`, or maybe pick a "big enough" power of two instead) Maybe we should not support block size in the case of a bounded arena. i.e. just allocate the whole thing upfront, and have 3 APIs: 1. arena with no bounds and default block size. 2. arena with no bounds and custom block size. 3. arena with bounds, that has no blocks size but allocates the whole thing in one go (could be modeled as block size = arena size). Right now we have 1. and 2., but instead of 3. we have a variant that allows setting both the arena size and block size. If we want to keep what we currently have, I'd suggest changing the arena size to a block count for the variant that takes both the arena size and the block size (I think in that case `Long.MAX_VALUE` should still work?). Any ways, that seems like something that could be addressed in 19 as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Tue Nov 2 21:33:46 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Tue, 2 Nov 2021 21:33:46 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v14] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix long comment line in Module.java ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/1126133a..c219ae12 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=13 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From duke at openjdk.java.net Tue Nov 2 23:47:26 2021 From: duke at openjdk.java.net (Joshua Cao) Date: Tue, 2 Nov 2021 23:47:26 GMT Subject: RFR: 8274860: gcc 10.2.1 produces an uninitialized warning in sharedRuntimeTrig.cpp Message-ID: Initialize `fq` to an array to zeroes. ------------- Commit messages: - 8274860: gcc 10.2.1 produces an uninitialized warning in sharedRuntimeTrig.cpp Changes: https://git.openjdk.java.net/jdk/pull/6220/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6220&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274860 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6220.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6220/head:pull/6220 PR: https://git.openjdk.java.net/jdk/pull/6220 From manc at openjdk.java.net Wed Nov 3 01:07:28 2021 From: manc at openjdk.java.net (Man Cao) Date: Wed, 3 Nov 2021 01:07:28 GMT Subject: RFR: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build Message-ID: Hi all, Could anyone provide some feedback on this bug fix and refactoring change? See https://bugs.openjdk.java.net/browse/JDK-8276453 for more details. If the direction of this change looks good, we can proceed removing the "UGLY HACK" in c1_LIR.hpp and refactor occurrences of "opr->fn()" to "opr.fn()". ------------- Commit messages: - Add _value field and rename LIR_OprDesc to LIR_Opr Changes: https://git.openjdk.java.net/jdk/pull/6221/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6221&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276453 Stats: 287 lines in 25 files changed: 23 ins; 16 del; 248 mod Patch: https://git.openjdk.java.net/jdk/pull/6221.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6221/head:pull/6221 PR: https://git.openjdk.java.net/jdk/pull/6221 From dholmes at openjdk.java.net Wed Nov 3 01:48:14 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 3 Nov 2021 01:48:14 GMT Subject: RFR: 8274860: gcc 10.2.1 produces an uninitialized warning in sharedRuntimeTrig.cpp In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 23:39:48 GMT, Joshua Cao wrote: > Initialize `fq` to an array to zeroes. Hi Joshua, This warning looks like a false positive to me. I'd prefer to see the warning disabled than make a change to highly optimised math code. Cheers, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6220 From njian at openjdk.java.net Wed Nov 3 03:15:15 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 3 Nov 2021 03:15:15 GMT Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator) [v7] In-Reply-To: References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> Message-ID: <4RJyhhtKPTjcJ894CoYqMYX0RdAsjRj0wwDcug9x4I8=.12d8e963-dc36-4cce-ad1b-241188dadd7b@github.com> On Wed, 27 Oct 2021 21:42:29 GMT, Paul Sandoz wrote: >> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE. >> >> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend. >> >> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work. >> >> No API enhancements were required and only a few additional tests were needed. > > Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge branch 'master' into JDK-8271515-vector-api > - Merge pull request #1 from nsjian/JDK-8271515 > > Address AArch64 review comments from Nick. > - Address review comments from Nick. > - Merge branch 'master' into JDK-8271515-vector-api > - Resolve review comments. > - Merge branch 'master' into JDK-8271515-vector-api > - Apply patch from https://github.com/openjdk/panama-vector/pull/152 > - Apply patch from https://github.com/openjdk/panama-vector/pull/142 > - Apply patch from https://github.com/openjdk/panama-vector/pull/139 > - Apply patch from https://github.com/openjdk/panama-vector/pull/151 > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/9a3e9542...c9a77225 src/hotspot/cpu/aarch64/aarch64_sve_ad.m4 line 2349: > 2347: BasicType to_bt = Matcher::vector_element_basic_type(this); > 2348: Assembler::SIMD_RegVariant to_size = __ elemType_to_regVariant(to_bt); > 2349: __ sve_fcvtzs(as_FloatRegister($dst$$reg), __ D, ptrue, as_FloatRegister($src$$reg), __ D); Converting from double to long and then narrow to target types did not follow JLS. I will fix it. Thanks to @fg1417 for helping to find out this issue. ------------- PR: https://git.openjdk.java.net/jdk/pull/5873 From shade at openjdk.java.net Wed Nov 3 09:30:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Nov 2021 09:30:13 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev wrote: >> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. >> >> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. >> >> There seem to be no performance regressions with this patch at least on Linux x86_64: >> >> >> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" >> >> Benchmark Mode Cnt Score Error Units >> >> ### Before >> >> StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms >> StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms >> StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms >> StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms >> >> >> StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms >> StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms >> StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms >> StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms >> >> >> StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms >> >> ### After >> >> StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms >> StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms >> StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms >> StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms >> >> StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms >> StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms >> StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms >> StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms >> >> StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms >> >> >> Additional testing: >> - [x] `StrictMath` benchmarks >> - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math` >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Keep intrinsics on StrictMath Thanks! I re-ran the tests, they seem to be fine. I need a second (R)eviewer for this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From aph at openjdk.java.net Wed Nov 3 09:37:16 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Nov 2021 09:37:16 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev wrote: >> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. >> >> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. >> >> There seem to be no performance regressions with this patch at least on Linux x86_64: >> >> >> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" >> >> Benchmark Mode Cnt Score Error Units >> >> ### Before >> >> StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms >> StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms >> StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms >> StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms >> >> >> StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms >> StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms >> StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms >> StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms >> >> >> StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms >> >> ### After >> >> StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms >> StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms >> StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms >> StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms >> >> StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms >> StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms >> StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms >> StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms >> >> StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms >> >> >> Additional testing: >> - [x] `StrictMath` benchmarks >> - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math` >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Keep intrinsics on StrictMath Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From duke at openjdk.java.net Wed Nov 3 10:03:20 2021 From: duke at openjdk.java.net (duke) Date: Wed, 3 Nov 2021 10:03:20 GMT Subject: Withdrawn: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 18:03:11 GMT, Tom Rodriguez wrote: > This evacuates all JVMCI related methods and fields into a separately declared struct. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From aph at openjdk.java.net Wed Nov 3 10:17:32 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Nov 2021 10:17:32 GMT Subject: RFR: 8275586: Zero: Simplify interpreter initialization In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev wrote: > The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it. > > This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version: > > > // Call the interpreter > if (JvmtiExport::can_post_interpreter_events()) { > BytecodeInterpreter::run(istate); > } else { > BytecodeInterpreter::run(istate); > } > > > Additional testing: > - [x] Linux x86_64 fastdebug `make bootcycle-images` Marked as reviewed by aph (Reviewer). src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 417: > 415: #define THREAD istate->thread() > 416: #endif > 417: This is a weirdly-hacky optimization, and is perhaps obsolete on modern compilers. While simplifying, I'd take it out. ------------- PR: https://git.openjdk.java.net/jdk/pull/6029 From shade at openjdk.java.net Wed Nov 3 10:17:33 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Nov 2021 10:17:33 GMT Subject: RFR: 8275586: Zero: Simplify interpreter initialization In-Reply-To: References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: On Wed, 3 Nov 2021 10:12:19 GMT, Andrew Haley wrote: >> The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it. >> >> This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version: >> >> >> // Call the interpreter >> if (JvmtiExport::can_post_interpreter_events()) { >> BytecodeInterpreter::run(istate); >> } else { >> BytecodeInterpreter::run(istate); >> } >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `make bootcycle-images` > > src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 417: > >> 415: #define THREAD istate->thread() >> 416: #endif >> 417: > > This is a weirdly-hacky optimization, and is perhaps obsolete on modern compilers. While simplifying, I'd take it out. I remember following up on this whole `LOTS_OF_REGS` mess, and it seems still profitable. I can take a look in a separate RFE, OK? ------------- PR: https://git.openjdk.java.net/jdk/pull/6029 From aph at openjdk.java.net Wed Nov 3 11:02:18 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Nov 2021 11:02:18 GMT Subject: RFR: 8275586: Zero: Simplify interpreter initialization In-Reply-To: References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: On Wed, 3 Nov 2021 10:14:24 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp line 417: >> >>> 415: #define THREAD istate->thread() >>> 416: #endif >>> 417: >> >> This is a weirdly-hacky optimization, and is perhaps obsolete on modern compilers. While simplifying, I'd take it out. > > I remember following up on this whole `LOTS_OF_REGS` mess, and it seems still profitable. I can take a look in a separate RFE, OK? OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/6029 From mcimadamore at openjdk.java.net Wed Nov 3 11:32:50 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 3 Nov 2021 11:32:50 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v15] In-Reply-To: References: Message-ID: <3l5SgC7qqzs4wj1leQ3TKp4gqDMXozx6W6bUxO1wlTA=.5db656f9-98d2-474a-918a-f076e63be127@github.com> > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Simplify ArenaAllocator impl. The arena should respect its boundaries and never allocate more memory than its size specifies. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/c219ae12..7f847271 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=13-14 Stats: 40 lines in 1 file changed: 8 ins; 15 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From david.holmes at oracle.com Wed Nov 3 12:09:52 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 3 Nov 2021 22:09:52 +1000 Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: <90446117-abb2-d26b-0396-be21a6387252@oracle.com> On 1/11/2021 7:16 pm, Aleksey Shipilev wrote: > On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > >> This is another instance of counter updates that only need atomic guarantee. > > (I am not arguing in favor or against this particular change, but I think we can talk a bit about generic stuff here...) > >> I don't know where this guarantee is coming from. Two r-m-w atomic ops must have some guarantee via coherence for the atomic op to actually work. And an implementation could make any atomic r-m-w implementation ensure global immediate visibility. But you cannot assume this is guaranteed for all hardware. Even for a given platform this would need to be a specified guarantee in the architecture manual, not just something deduced/inferred by reasoning. > > Hotspot's `memory_order_relaxed` is [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) with C++11 atomics semantics. C++11 atomic semantics for relaxed atomic ops requires [single modification order consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), which implies [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order). > > All known hardware platforms provide coherence out of the box (they are, indeed, cache-coherent platforms), that's why it is easy to implement in C++ (`mo_relaxed`) and in Java (`VarHandles.(get|set)opaque`). > > I am always confused by "immediate global visibility". The problem with statements that include "immediate", "before", "after" is that they leak in the notion of time, which is ill-defined for a single memory location without any reference to other variables. Maybe you can expand your concern with the example? Let me back up to be clear. I stated that memory-order-conservative might lower the chances (in a general platform-agnostic way) of seeing a stale value, compared to memory-order-relaxed, due to the stronger memory fence/barrier operation it implies. The response to that was: "value updated via atomic r-m-w operation should be visible to other threads guaranteed by coherence protocol" claiming that visibility guarantees were inherently present due to coherence regardless of what kind of memory fence/barrier were associated with the r-m-w atomic operation. I'm not sure if that is actually true. If it is true then we would not need any memory-order parameter on the r-m-w atomic operations because they would be all be the same due to this underlying coherence property. When I said "immediate global visibility" I was referring to a situation where once the write in the r-m-w atomic op had occurred then all subsequent reads would see the value of that write. It is true that such a thing may not require "immediacy" in a temporal sense, but the net effect is the same. David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/6065 > From david.holmes at oracle.com Wed Nov 3 12:23:47 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 3 Nov 2021 22:23:47 +1000 Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: <90446117-abb2-d26b-0396-be21a6387252@oracle.com> References: <90446117-abb2-d26b-0396-be21a6387252@oracle.com> Message-ID: Correction ... On 3/11/2021 10:09 pm, David Holmes wrote: > On 1/11/2021 7:16 pm, Aleksey Shipilev wrote: >> On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: >> >>> This is another instance of counter updates that only need atomic >>> guarantee. >> >> (I am not arguing in favor or against this particular change, but I >> think we can talk a bit about generic stuff here...) >> >>> I don't know where this guarantee is coming from. Two r-m-w atomic >>> ops must have some guarantee via coherence for the atomic op to >>> actually work. And an implementation could make any atomic r-m-w >>> implementation ensure global immediate visibility. But you cannot >>> assume this is guaranteed for all hardware. Even for a given platform >>> this would need to be a specified guarantee in the architecture >>> manual, not just something deduced/inferred by reasoning. >> >> Hotspot's `memory_order_relaxed` is >> [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) >> with C++11 atomics semantics. C++11 atomic semantics for relaxed >> atomic ops requires [single modification order >> consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), >> which implies >> [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order). >> >> >> All known hardware platforms provide coherence out of the box (they >> are, indeed, cache-coherent platforms), that's why it is easy to >> implement in C++ (`mo_relaxed`) and in Java >> (`VarHandles.(get|set)opaque`). >> >> I am always confused by "immediate global visibility". The problem >> with statements that include "immediate", "before", "after" is that >> they leak in the notion of time, which is ill-defined for a single >> memory location without any reference to other variables. Maybe you >> can expand your concern with the example? > > Let me back up to be clear. I stated that memory-order-conservative > might lower the chances (in a general platform-agnostic way) of seeing a > stale value, compared to memory-order-relaxed, due to the stronger > memory fence/barrier operation it implies. The response to that was: > > "value updated via atomic r-m-w operation should be visible to other > threads guaranteed by coherence protocol" > > claiming that visibility guarantees were inherently present due to > coherence regardless of what kind of memory fence/barrier were > associated with the r-m-w atomic operation. I'm not sure if that is > actually true. If it is true then we would not need any memory-order > parameter on the r-m-w atomic operations because they would be all be > the same due to this underlying coherence property. No that isn't true. I see now that the C++ "Modification Order" definition requires the write to the counter to be (for want of a better term) "immediately visible" to any subsequent read - so no stale value could be read. That is a far stronger guarantee than I expected from mo_relaxed. The use of other mo values on the r-m-w atomic operation impact the ordering between that variable and other atomic variables. David ----- > When I said "immediate global visibility" I was referring to a situation > where once the write in the r-m-w atomic op had occurred then all > subsequent reads would see the value of that write. It is true that such > a thing may not require "immediacy" in a temporal sense, but the net > effect is the same. > > David > ----- > >> ------------- >> >> PR: https://git.openjdk.java.net/jdk/pull/6065 >> From mcimadamore at openjdk.java.net Wed Nov 3 13:08:55 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 3 Nov 2021 13:08:55 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v16] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Make ArenaAllocator impl more flexible in the face of OOME An ArenaAllocator should remain open for business, even if OOME is thrown in case other allocations can fit the arena size. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/7f847271..9fafb2a6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=14-15 Stats: 13 lines in 2 files changed: 3 ins; 6 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From jvernee at openjdk.java.net Wed Nov 3 13:40:16 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Wed, 3 Nov 2021 13:40:16 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v16] In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 13:08:55 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Make ArenaAllocator impl more flexible in the face of OOME > An ArenaAllocator should remain open for business, even if OOME is thrown in case other allocations can fit the arena size. Marked as reviewed by jvernee (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From zgu at openjdk.java.net Wed Nov 3 16:54:26 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 3 Nov 2021 16:54:26 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > This is another instance of counter updates that only need atomic guarantee. > _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > Correction ... > > On 3/11/2021 10:09 pm, David Holmes wrote: > > > On 1/11/2021 7:16 pm, Aleksey Shipilev wrote: > > > On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > > > > This is another instance of counter updates that only need atomic > > > > guarantee. > > > > > > > > > (I am not arguing in favor or against this particular change, but I > > > think we can talk a bit about generic stuff here...) > > > > I don't know where this guarantee is coming from. Two r-m-w atomic > > > > ops must have some guarantee via coherence for the atomic op to > > > > actually work. And an implementation could make any atomic r-m-w > > > > implementation ensure global immediate visibility. But you cannot > > > > assume this is guaranteed for all hardware. Even for a given platform > > > > this would need to be a specified guarantee in the architecture > > > > manual, not just something deduced/inferred by reasoning. > > > > > > > > > Hotspot's `memory_order_relaxed` is > > > [aligned](https://github.com/openjdk/jdk/blob/5bb1992b8408a0d196b1afa308bc00d007458dbd/src/hotspot/share/runtime/atomic.hpp#L44-L45) > > > with C++11 atomics semantics. C++11 atomic semantics for relaxed > > > atomic ops requires [single modification order > > > consistency](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering), > > > which implies > > > [coherence](https://en.cppreference.com/w/cpp/atomic/memory_order#Modification_order). > > > All known hardware platforms provide coherence out of the box (they > > > are, indeed, cache-coherent platforms), that's why it is easy to > > > implement in C++ (`mo_relaxed`) and in Java > > > (`VarHandles.(get|set)opaque`). > > > I am always confused by "immediate global visibility". The problem > > > with statements that include "immediate", "before", "after" is that > > > they leak in the notion of time, which is ill-defined for a single > > > memory location without any reference to other variables. Maybe you > > > can expand your concern with the example? > > > > > > Let me back up to be clear. I stated that memory-order-conservative > > might lower the chances (in a general platform-agnostic way) of seeing a > > stale value, compared to memory-order-relaxed, due to the stronger > > memory fence/barrier operation it implies. The response to that was: > > "value updated via atomic r-m-w operation should be visible to other > > threads guaranteed by coherence protocol" > > claiming that visibility guarantees were inherently present due to > > coherence regardless of what kind of memory fence/barrier were > > associated with the r-m-w atomic operation. I'm not sure if that is > > actually true. If it is true then we would not need any memory-order > > parameter on the r-m-w atomic operations because they would be all be > > the same due to this underlying coherence property. > > No that isn't true. I see now that the C++ "Modification Order" definition requires the write to the counter to be (for want of a better term) "immediately visible" to any subsequent read - so no stale value could be read. That is a far stronger guarantee than I expected from mo_relaxed. The use of other mo values on the r-m-w atomic operation impact the ordering between that variable and other atomic variables. > > David ----- > Yes, for this single location atomic counter, there is no ordering involved. Although the counters are not hot, but more restricted memory constraints do not add any values. Are you okay with this change? Thanks, -Zhengyu > > When I said "immediate global visibility" I was referring to a situation > > where once the write in the r-m-w atomic op had occurred then all > > subsequent reads would see the value of that write. It is true that such > > a thing may not require "immediacy" in a temporal sense, but the net > > effect is the same. > > David > > ----- ------------- PR: https://git.openjdk.java.net/jdk/pull/6065 From mcimadamore at openjdk.java.net Wed Nov 3 17:40:56 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 3 Nov 2021 17:40:56 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v17] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix TestUpcall * reverse() has a bug, as it doesn't tweak parameter types * reverse() is applied to the wrong MH ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/9fafb2a6..b9432473 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=16 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=15-16 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From shade at openjdk.java.net Wed Nov 3 17:42:19 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Nov 2021 17:42:19 GMT Subject: RFR: 8276217: Harmonize StrictMath intrinsics handling [v3] In-Reply-To: References: Message-ID: <8z4CwkkYxAh283DZApwKTUKeqHgrohjezFmCX49g1dU=.f347492f-f369-40da-bacf-573f9ec9a997@github.com> On Tue, 2 Nov 2021 06:25:33 GMT, Aleksey Shipilev wrote: >> This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. >> >> For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. >> >> There seem to be no performance regressions with this patch at least on Linux x86_64: >> >> >> $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" >> >> Benchmark Mode Cnt Score Error Units >> >> ### Before >> >> StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms >> StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms >> StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms >> StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms >> >> >> StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms >> StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms >> StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms >> StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms >> >> >> StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms >> >> ### After >> >> StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms >> StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms >> StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms >> StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms >> >> StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms >> StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms >> StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms >> StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms >> >> StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms >> >> >> Additional testing: >> - [x] `StrictMath` benchmarks >> - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math` >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Keep intrinsics on StrictMath Thanks! I am going to push this tomorrow morning, if no other comments show up. ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From kvn at openjdk.java.net Wed Nov 3 19:00:23 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Nov 2021 19:00:23 GMT Subject: RFR: 8276571: C2: pass compilation options as structure Message-ID: Currently we pass several compilation options as separate arguments to `Compile`: Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); Originally we had only `subsume_loads` option but we added few since then and we may add more. I suggest to add new `Options` class to pass these values into `Compile`. ------------- Commit messages: - 8276571: C2: pass compilation options as structure Changes: https://git.openjdk.java.net/jdk/pull/6237/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276571 Stats: 66 lines in 4 files changed: 30 ins; 15 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6237.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6237/head:pull/6237 PR: https://git.openjdk.java.net/jdk/pull/6237 From darcy at openjdk.java.net Wed Nov 3 21:06:31 2021 From: darcy at openjdk.java.net (Joe Darcy) Date: Wed, 3 Nov 2021 21:06:31 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources Message-ID: I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances. ------------- Commit messages: - JDK-8276588: Change "ccc" to "CSR" in HotSpot sources Changes: https://git.openjdk.java.net/jdk/pull/6240/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6240&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276588 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6240.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6240/head:pull/6240 PR: https://git.openjdk.java.net/jdk/pull/6240 From dcubed at openjdk.java.net Wed Nov 3 21:11:14 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 3 Nov 2021 21:11:14 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:58:23 GMT, Joe Darcy wrote: > I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances. Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6240 From kbarrett at openjdk.java.net Wed Nov 3 21:19:12 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 3 Nov 2021 21:19:12 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:58:23 GMT, Joe Darcy wrote: > I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances. Marked as reviewed by kbarrett (Reviewer). src/hotspot/share/oops/instanceKlass.cpp line 731: > 729: } > 730: > 731: // To remove these from requires an incompatible change and CSR review. I don't know what this comment is trying to say; I think there might be missing words or something. But the change for CCC -> CSR is fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/6240 From darcy at openjdk.java.net Wed Nov 3 21:23:15 2021 From: darcy at openjdk.java.net (Joe Darcy) Date: Wed, 3 Nov 2021 21:23:15 GMT Subject: Integrated: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 20:58:23 GMT, Joe Darcy wrote: > I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances. This pull request has now been integrated. Changeset: f3320d2f Author: Joe Darcy URL: https://git.openjdk.java.net/jdk/commit/f3320d2fbd28349fa5eab3ea0da0ff0a3ef54c62 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod 8276588: Change "ccc" to "CSR" in HotSpot sources Reviewed-by: dcubed, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/6240 From dholmes at openjdk.java.net Thu Nov 4 01:32:08 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Nov 2021 01:32:08 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > This is another instance of counter updates that only need atomic guarantee. I'm not sure there is any actual benefit to this change, but I also do not see any harm. So okay. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6065 From dholmes at openjdk.java.net Thu Nov 4 01:45:19 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Nov 2021 01:45:19 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2] In-Reply-To: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com> References: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com> Message-ID: On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev wrote: >> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. >> >> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Default >> Single.acquire avgt 3 0.407 ? 0.060 ns/op >> Single.full avgt 3 4.693 ? 0.005 ns/op >> Single.loadLoad avgt 3 0.415 ? 0.095 ns/op >> Single.plain avgt 3 0.406 ? 0.002 ns/op >> Single.release avgt 3 0.408 ? 0.047 ns/op >> Single.storeStore avgt 3 0.408 ? 0.043 ns/op >> >> # -XX:DisableIntrinsic=_storeFence >> Single.acquire avgt 3 0.408 ? 0.016 ns/op >> Single.full avgt 3 4.694 ? 0.002 ns/op >> Single.loadLoad avgt 3 0.406 ? 0.002 ns/op >> Single.plain avgt 3 0.406 ? 0.001 ns/op >> Single.release avgt 3 4.694 ? 0.003 ns/op <--- upgraded to full >> Single.storeStore avgt 3 4.690 ? 0.005 ns/op <--- upgraded to full >> >> # -XX:DisableIntrinsic=_loadFence >> Single.acquire avgt 3 4.691 ? 0.001 ns/op <--- upgraded to full >> Single.full avgt 3 4.693 ? 0.009 ns/op >> Single.loadLoad avgt 3 4.693 ? 0.013 ns/op <--- upgraded to full >> Single.plain avgt 3 0.408 ? 0.072 ns/op >> Single.release avgt 3 0.415 ? 0.016 ns/op >> Single.storeStore avgt 3 0.416 ? 0.041 ns/op >> >> # -XX:DisableIntrinsic=_fullFence >> Single.acquire avgt 3 0.406 ? 0.014 ns/op >> Single.full avgt 3 15.836 ? 0.151 ns/op <--- calls runtime >> Single.loadLoad avgt 3 0.406 ? 0.001 ns/op >> Single.plain avgt 3 0.426 ? 0.361 ns/op >> Single.release avgt 3 0.407 ? 0.021 ns/op >> Single.storeStore avgt 3 0.410 ? 0.061 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence >> Single.acquire avgt 3 15.822 ? 0.282 ns/op <--- upgraded, calls runtime >> Single.full avgt 3 15.851 ? 0.127 ns/op <--- calls runtime >> Single.loadLoad avgt 3 15.829 ? 0.045 ns/op <--- upgraded, calls runtime >> Single.plain avgt 3 0.406 ? 0.001 ns/op >> Single.release avgt 3 0.414 ? 0.156 ns/op >> Single.storeStore avgt 3 0.422 ? 0.452 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_storeFence >> Single.acquire avgt 3 0.407 ? 0.016 ns/op >> Single.full avgt 3 15.347 ? 6.783 ns/op <--- calls runtime >> Single.loadLoad avgt 3 0.406 ? 0.001 ns/op >> Single.plain avgt 3 0.406 ? 0.002 ns/op >> Single.release avgt 3 15.828 ? 0.019 ns/op <--- upgraded, calls runtime >> Single.storeStore avgt 3 15.834 ? 0.045 ns/op <--- upgraded, calls runtime >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence >> Single.acquire avgt 3 15.838 ? 0.030 ns/op <--- upgraded, calls runtime >> Single.full avgt 3 15.854 ? 0.277 ns/op <--- calls runtime >> Single.loadLoad avgt 3 15.826 ? 0.160 ns/op <--- upgraded, calls runtime >> Single.plain avgt 3 0.406 ? 0.003 ns/op >> Single.release avgt 3 15.838 ? 0.019 ns/op <--- upgraded, calls runtime >> Single.storeStore avgt 3 15.844 ? 0.104 ns/op <--- upgraded, calls runtime >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Restore RN for fullFence Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From dholmes at openjdk.java.net Thu Nov 4 02:13:19 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Nov 2021 02:13:19 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 21:15:29 GMT, Kim Barrett wrote: >> I noticed an out-of-date use of "ccc" in the HotSpot sources and grepped over the sources to find and fix all such instances. > > src/hotspot/share/oops/instanceKlass.cpp line 731: > >> 729: } >> 730: >> 731: // To remove these from requires an incompatible change and CSR review. > > I don't know what this comment is trying to say; I think there might be missing words or something. But the change for CCC -> CSR is fine. Given the 'R' in CSR already stands for Review this should have said "CSR request". But I also have no idea what the comment is actually trying to say - what is "these" referring to??? ------------- PR: https://git.openjdk.java.net/jdk/pull/6240 From duke at openjdk.java.net Thu Nov 4 02:39:09 2021 From: duke at openjdk.java.net (Fei Gao) Date: Thu, 4 Nov 2021 02:39:09 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 11:37:23 GMT, Andrew Haley wrote: >> for(int i = 0; i < LENGTH; i++) { >> c[i] = a[i] + 2; >> } >> >> For the case showed above, after superword optimization with SVE, >> without the patch, the vector add operation always has 2 z-reg inputs, >> like: >> mov z16.s, #2 >> add z17.s, z17.s, z16.s >> >> Considering sve has supported basic binary operations with immediate, >> this pattern could be further optimized to: >> add z16.s, z16.s, #2 >> >> To implement it, we added some new match rules and assembler rules in >> the aarch64 backend. We also made some extensions on immediate types >> and functions to keep backward compatible. >> >> With the patch, only these binary integer vector operations, +(add), >> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for >> the optimization. Other vector operations are not supported currently. >> >> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 >> CPU, no new failure. >> >> There is no obvious performance uplift but it can help remove one >> redundant mov instruction. > > I'd like you to split this patch into two parts, please. > First, please use the new functions such as `Assembler::operand_valid_for_logical_immediate(bool is32, uint64_t imm)` only for SVE, leaving the existing logic in `Assembler` entirely untouched. This will cause some duplication, but that's OK. We can review changes to merge functionality in a separate patch. This will be much easier. @theRealAph , could you please help approve it? Thanks for your time :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6115 From mli at openjdk.java.net Thu Nov 4 05:16:33 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 4 Nov 2021 05:16:33 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter Message-ID: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments: GROUP_COUNT=4 TI_JVM_COUNT=1 JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" MODE_ARGS="-ikv" ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/6246/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6246&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276618 Stats: 8 lines in 3 files changed: 2 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6246.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6246/head:pull/6246 PR: https://git.openjdk.java.net/jdk/pull/6246 From dholmes at openjdk.java.net Thu Nov 4 06:19:09 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Nov 2021 06:19:09 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > MODE_ARGS="-ikv" Hi Hamlin, This seems reasonable to me, however whenever we add padding to optimise the placement of one field, I always wonder if that same padding has de-optimised the placement of other fields? I think we need to see a broader run of benchmarks here and across more than just x86_64. I will see if I can assist on the benchmark front. Thanks, David src/hotspot/share/runtime/thread.hpp line 253: > 251: > 252: // Support for GlobalCounter > 253: private: pre-existing nit: this private is not needed; nor is the public at line 260. ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From alanb at openjdk.java.net Thu Nov 4 07:29:19 2021 From: alanb at openjdk.java.net (Alan Bateman) Date: Thu, 4 Nov 2021 07:29:19 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 02:10:37 GMT, David Holmes wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 731: >> >>> 729: } >>> 730: >>> 731: // To remove these from requires an incompatible change and CSR review. >> >> I don't know what this comment is trying to say; I think there might be missing words or something. But the change for CCC -> CSR is fine. > > Given the 'R' in CSR already stands for Review this should have said "CSR request". > > But I also have no idea what the comment is actually trying to say - what is "these" referring to??? I don't know why that comment is there. The API is Class::getSigners and any changes to its behavior would require a CSR, but we are free to change the implementation. So maybe the comment should be removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6240 From mli at openjdk.java.net Thu Nov 4 07:30:09 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 4 Nov 2021 07:30:09 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > MODE_ARGS="-ikv" Thanks a lot David, it will be very helpful. BTW, I will modify as you suggested later together with other's comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From shade at openjdk.java.net Thu Nov 4 08:03:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 08:03:09 GMT Subject: RFR: 8275586: Zero: Simplify interpreter initialization In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev wrote: > The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it. > > This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version: > > > // Call the interpreter > if (JvmtiExport::can_post_interpreter_events()) { > BytecodeInterpreter::run(istate); > } else { > BytecodeInterpreter::run(istate); > } > > > Additional testing: > - [x] Linux x86_64 fastdebug `make bootcycle-images` I think I need a second (R)eviewer for this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6029 From tschatzl at openjdk.java.net Thu Nov 4 08:04:10 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Nov 2021 08:04:10 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > MODE_ARGS="-ikv" I'll push it through our perf testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From shade at openjdk.java.net Thu Nov 4 08:08:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 08:08:17 GMT Subject: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2] In-Reply-To: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com> References: <6VLgphi_CCvby1B3jzpYuchN6ZT-dFaZ2e9VSba3YsQ=.62b863ac-0b29-47fa-a6d8-2ca49b8dd891@github.com> Message-ID: On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev wrote: >> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. >> >> This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Default >> Single.acquire avgt 3 0.407 ? 0.060 ns/op >> Single.full avgt 3 4.693 ? 0.005 ns/op >> Single.loadLoad avgt 3 0.415 ? 0.095 ns/op >> Single.plain avgt 3 0.406 ? 0.002 ns/op >> Single.release avgt 3 0.408 ? 0.047 ns/op >> Single.storeStore avgt 3 0.408 ? 0.043 ns/op >> >> # -XX:DisableIntrinsic=_storeFence >> Single.acquire avgt 3 0.408 ? 0.016 ns/op >> Single.full avgt 3 4.694 ? 0.002 ns/op >> Single.loadLoad avgt 3 0.406 ? 0.002 ns/op >> Single.plain avgt 3 0.406 ? 0.001 ns/op >> Single.release avgt 3 4.694 ? 0.003 ns/op <--- upgraded to full >> Single.storeStore avgt 3 4.690 ? 0.005 ns/op <--- upgraded to full >> >> # -XX:DisableIntrinsic=_loadFence >> Single.acquire avgt 3 4.691 ? 0.001 ns/op <--- upgraded to full >> Single.full avgt 3 4.693 ? 0.009 ns/op >> Single.loadLoad avgt 3 4.693 ? 0.013 ns/op <--- upgraded to full >> Single.plain avgt 3 0.408 ? 0.072 ns/op >> Single.release avgt 3 0.415 ? 0.016 ns/op >> Single.storeStore avgt 3 0.416 ? 0.041 ns/op >> >> # -XX:DisableIntrinsic=_fullFence >> Single.acquire avgt 3 0.406 ? 0.014 ns/op >> Single.full avgt 3 15.836 ? 0.151 ns/op <--- calls runtime >> Single.loadLoad avgt 3 0.406 ? 0.001 ns/op >> Single.plain avgt 3 0.426 ? 0.361 ns/op >> Single.release avgt 3 0.407 ? 0.021 ns/op >> Single.storeStore avgt 3 0.410 ? 0.061 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence >> Single.acquire avgt 3 15.822 ? 0.282 ns/op <--- upgraded, calls runtime >> Single.full avgt 3 15.851 ? 0.127 ns/op <--- calls runtime >> Single.loadLoad avgt 3 15.829 ? 0.045 ns/op <--- upgraded, calls runtime >> Single.plain avgt 3 0.406 ? 0.001 ns/op >> Single.release avgt 3 0.414 ? 0.156 ns/op >> Single.storeStore avgt 3 0.422 ? 0.452 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_storeFence >> Single.acquire avgt 3 0.407 ? 0.016 ns/op >> Single.full avgt 3 15.347 ? 6.783 ns/op <--- calls runtime >> Single.loadLoad avgt 3 0.406 ? 0.001 ns/op >> Single.plain avgt 3 0.406 ? 0.002 ns/op >> Single.release avgt 3 15.828 ? 0.019 ns/op <--- upgraded, calls runtime >> Single.storeStore avgt 3 15.834 ? 0.045 ns/op <--- upgraded, calls runtime >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence >> Single.acquire avgt 3 15.838 ? 0.030 ns/op <--- upgraded, calls runtime >> Single.full avgt 3 15.854 ? 0.277 ns/op <--- calls runtime >> Single.loadLoad avgt 3 15.826 ? 0.160 ns/op <--- upgraded, calls runtime >> Single.plain avgt 3 0.406 ? 0.003 ns/op >> Single.release avgt 3 15.838 ? 0.019 ns/op <--- upgraded, calls runtime >> Single.storeStore avgt 3 15.844 ? 0.104 ns/op <--- upgraded, calls runtime >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Restore RN for fullFence Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From shade at openjdk.java.net Thu Nov 4 08:08:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 08:08:18 GMT Subject: Integrated: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence In-Reply-To: References: Message-ID: On Thu, 28 Oct 2021 08:47:31 GMT, Aleksey Shipilev wrote: > `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) to call to runtime for a single memory barrier. We can simplify the native `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` intrinsics are not available. This would be similar to what `Unsafe.{loadLoad|storeStore}Fences` do. > > This is the behavior of these intrinsics now, on x86_64, using benchmarks from JDK-8276054: > > > Benchmark Mode Cnt Score Error Units > > # Default > Single.acquire avgt 3 0.407 ? 0.060 ns/op > Single.full avgt 3 4.693 ? 0.005 ns/op > Single.loadLoad avgt 3 0.415 ? 0.095 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 0.408 ? 0.047 ns/op > Single.storeStore avgt 3 0.408 ? 0.043 ns/op > > # -XX:DisableIntrinsic=_storeFence > Single.acquire avgt 3 0.408 ? 0.016 ns/op > Single.full avgt 3 4.694 ? 0.002 ns/op > Single.loadLoad avgt 3 0.406 ? 0.002 ns/op > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 4.694 ? 0.003 ns/op <--- upgraded to full > Single.storeStore avgt 3 4.690 ? 0.005 ns/op <--- upgraded to full > > # -XX:DisableIntrinsic=_loadFence > Single.acquire avgt 3 4.691 ? 0.001 ns/op <--- upgraded to full > Single.full avgt 3 4.693 ? 0.009 ns/op > Single.loadLoad avgt 3 4.693 ? 0.013 ns/op <--- upgraded to full > Single.plain avgt 3 0.408 ? 0.072 ns/op > Single.release avgt 3 0.415 ? 0.016 ns/op > Single.storeStore avgt 3 0.416 ? 0.041 ns/op > > # -XX:DisableIntrinsic=_fullFence > Single.acquire avgt 3 0.406 ? 0.014 ns/op > Single.full avgt 3 15.836 ? 0.151 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.426 ? 0.361 ns/op > Single.release avgt 3 0.407 ? 0.021 ns/op > Single.storeStore avgt 3 0.410 ? 0.061 ns/op > > # -XX:DisableIntrinsic=_fullFence,_loadFence > Single.acquire avgt 3 15.822 ? 0.282 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.851 ? 0.127 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.829 ? 0.045 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.001 ns/op > Single.release avgt 3 0.414 ? 0.156 ns/op > Single.storeStore avgt 3 0.422 ? 0.452 ns/op > > # -XX:DisableIntrinsic=_fullFence,_storeFence > Single.acquire avgt 3 0.407 ? 0.016 ns/op > Single.full avgt 3 15.347 ? 6.783 ns/op <--- calls runtime > Single.loadLoad avgt 3 0.406 ? 0.001 ns/op > Single.plain avgt 3 0.406 ? 0.002 ns/op > Single.release avgt 3 15.828 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.834 ? 0.045 ns/op <--- upgraded, calls runtime > > # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence > Single.acquire avgt 3 15.838 ? 0.030 ns/op <--- upgraded, calls runtime > Single.full avgt 3 15.854 ? 0.277 ns/op <--- calls runtime > Single.loadLoad avgt 3 15.826 ? 0.160 ns/op <--- upgraded, calls runtime > Single.plain avgt 3 0.406 ? 0.003 ns/op > Single.release avgt 3 15.838 ? 0.019 ns/op <--- upgraded, calls runtime > Single.storeStore avgt 3 15.844 ? 0.104 ns/op <--- upgraded, calls runtime > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` This pull request has now been integrated. Changeset: fb0be81f Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/fb0be81f0148d9aea73321a0c2bd83b2e477d952 Stats: 21 lines in 3 files changed: 6 ins; 11 del; 4 mod 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence Reviewed-by: psandoz, aph, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6149 From shade at openjdk.java.net Thu Nov 4 08:11:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 08:11:18 GMT Subject: Integrated: 8276217: Harmonize StrictMath intrinsics handling In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 11:23:16 GMT, Aleksey Shipilev wrote: > This blocks JDK-8276215: `StrictMath` intrinsics are handled peculiarly by giving failing intrinsics a second chance to match against the similar `Math` intrinsics. This has interesting consequence for matchers: we can match the native `StrictMath.sqrt` to non-native intrinsic for `Math.sqrt`. Interpreter would then have to disambiguate the two. It could be made simpler and more consistent. > > For `min`/`max` methods, `StrictMath` already delegates to `Math` methods, so we can just drop the intrinsics for them. `sqrt` is harder to delegate, because it is `native` and a part of public API, so we can instead do the proper special intrinsic for it. > > There seem to be no performance regressions with this patch at least on Linux x86_64: > > > $ CONF=linux-x86_64-server-release make test TEST="micro:StrictMathBench" > > Benchmark Mode Cnt Score Error Units > > ### Before > > StrictMathBench.minDouble thrpt 4 230921.558 ? 234.238 ops/ms > StrictMathBench.minFloat thrpt 4 230932.303 ? 126.721 ops/ms > StrictMathBench.minInt thrpt 4 230917.256 ? 73.008 ops/ms > StrictMathBench.minLong thrpt 4 194460.828 ? 178.079 ops/ms > > > StrictMathBench.maxDouble thrpt 4 230983.180 ? 161.211 ops/ms > StrictMathBench.maxFloat thrpt 4 230969.290 ? 277.500 ops/ms > StrictMathBench.maxInt thrpt 4 231033.581 ? 200.015 ops/ms > StrictMathBench.maxLong thrpt 4 194590.744 ? 114.295 ops/ms > > > StrictMathBench.sqrtDouble thrpt 4 230722.037 ? 2222.080 ops/ms > > ### After > > StrictMathBench.minDouble thrpt 4 230976.625 ? 67.338 ops/ms > StrictMathBench.minFloat thrpt 4 230896.021 ? 270.434 ops/ms > StrictMathBench.minInt thrpt 4 230859.741 ? 403.147 ops/ms > StrictMathBench.minLong thrpt 4 194456.673 ? 111.557 ops/ms > > StrictMathBench.maxDouble thrpt 4 230890.776 ? 89.924 ops/ms > StrictMathBench.maxFloat thrpt 4 230918.334 ? 63.160 ops/ms > StrictMathBench.maxInt thrpt 4 231059.128 ? 51.224 ops/ms > StrictMathBench.maxLong thrpt 4 194488.210 ? 495.224 ops/ms > > StrictMathBench.sqrtDouble thrpt 4 231023.703 ? 247.330 ops/ms > > > Additional testing: > - [x] `StrictMath` benchmarks > - [x] Linux x86_64 fastdebug `java/lang/StrictMath`, `java/lang/Math` > - [x] Linux x86_64 fastdebug `tier1` This pull request has now been integrated. Changeset: 9eadcbb4 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/9eadcbb47e902f42d933ba68e24f2bfb0ee20915 Stats: 125 lines in 15 files changed: 80 ins; 27 del; 18 mod 8276217: Harmonize StrictMath intrinsics handling Reviewed-by: aph, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6184 From mli at openjdk.java.net Thu Nov 4 08:38:19 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 4 Nov 2021 08:38:19 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > MODE_ARGS="-ikv" Thanks a lot Thomas. :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From shade at openjdk.java.net Thu Nov 4 09:40:11 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 09:40:11 GMT Subject: RFR: 8276571: C2: pass compilation options as structure In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 18:49:47 GMT, Vladimir Kozlov wrote: > Currently we pass several compilation options as separate arguments to `Compile`: > > Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); > > Originally we had only `subsume_loads` option but we added few since then and we may add more. > > I suggest to add new `Options` class to pass these values into `Compile`. I like the way it is going, but unfortunately I find the list of unnamed boolean arguments as confusing and error-prone as before... Could we use "named parameters idiom" here, or some other way to name these parameters? Something like: class Options { Options() : _subsume_loads(false), _do_escape_analysis(false) {}; Options& subsume_loads() { _subsume_loads = true; return *this; } Options& do_escape_analysis() { _do_escape_analysis = true; return *this; } } src/hotspot/share/opto/compile.cpp line 490: > 488: #ifndef PRODUCT > 489: // Check if recompiling > 490: if ((subsume_loads() == false) && PrintOpto) { Suggestion: if (!subsume_loads() && PrintOpto) { ------------- PR: https://git.openjdk.java.net/jdk/pull/6237 From adinn at openjdk.java.net Thu Nov 4 09:53:16 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 4 Nov 2021 09:53:16 GMT Subject: RFR: 8275586: Zero: Simplify interpreter initialization In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev wrote: > The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it. > > This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version: > > > // Call the interpreter > if (JvmtiExport::can_post_interpreter_events()) { > BytecodeInterpreter::run(istate); > } else { > BytecodeInterpreter::run(istate); > } > > > Additional testing: > - [x] Linux x86_64 fastdebug `make bootcycle-images` Yes this looks good. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6029 From shade at openjdk.java.net Thu Nov 4 10:26:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 10:26:17 GMT Subject: RFR: 8275586: Zero: Simplify interpreter initialization In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: <_JLmXDetzSNbsPPQedgscbv9a-WSJ8N0i5xW1w7t9eI=.f328c164-5056-4a8d-b2f4-da51eace5d9e@github.com> On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev wrote: > The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it. > > This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version: > > > // Call the interpreter > if (JvmtiExport::can_post_interpreter_events()) { > BytecodeInterpreter::run(istate); > } else { > BytecodeInterpreter::run(istate); > } > > > Additional testing: > - [x] Linux x86_64 fastdebug `make bootcycle-images` Cool, thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6029 From shade at openjdk.java.net Thu Nov 4 10:26:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 10:26:18 GMT Subject: Integrated: 8275586: Zero: Simplify interpreter initialization In-Reply-To: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> References: <8FCUBqssHqcaYRC6gnr37F8A9gGX1Hzvx8ny5BQblOY=.b54f2029-7a37-493a-bcc9-6fec9c29c943@github.com> Message-ID: On Wed, 20 Oct 2021 07:44:36 GMT, Aleksey Shipilev wrote: > The prolog in `BytecodeInterpreter` is hairy due to early initialization of interpreter statics. Previous rewrites make it mostly redundant, and we can now simplify it. > > This also implicitly fixes a initialization bug. If `JvmtiExport::can_post_interpreter_events()` changes at runtime, we will call into the uninitialized version: > > > // Call the interpreter > if (JvmtiExport::can_post_interpreter_events()) { > BytecodeInterpreter::run(istate); > } else { > BytecodeInterpreter::run(istate); > } > > > Additional testing: > - [x] Linux x86_64 fastdebug `make bootcycle-images` This pull request has now been integrated. Changeset: 3613ce7c Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/3613ce7c7d5bc8b7d603e1cf6a123588339aed3f Stats: 70 lines in 3 files changed: 7 ins; 48 del; 15 mod 8275586: Zero: Simplify interpreter initialization Reviewed-by: aph, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/6029 From simonis at openjdk.java.net Thu Nov 4 12:18:46 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 12:18:46 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v4] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters - Fix special case where we're creating an implicit exception for a regular invoke* bytecode - Minor updates as requested by @TheRealMDoerr - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow ------------- Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=03 Stats: 747 lines in 15 files changed: 739 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From zgu at openjdk.java.net Thu Nov 4 12:28:16 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 4 Nov 2021 12:28:16 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: <6ngB1es9Q-dgch7Z4qRzxSksn01hTlpVLmWHdYHUn98=.bef754fb-710d-4607-9a81-09135e70eb77@github.com> On Thu, 4 Nov 2021 01:29:03 GMT, David Holmes wrote: > I'm not sure there is any actual benefit to this change, but I also do not see any harm. So okay. > > Thanks, David Thanks, @dholmes-ora I don't believe it has measurable impact neither. In theory, mo_conservative is much more expensive ... -Zhengyu ------------- PR: https://git.openjdk.java.net/jdk/pull/6065 From simonis at openjdk.java.net Thu Nov 4 12:35:11 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 12:35:11 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v4] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 12:18:46 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Hi, sorry for the delay. I've had a look at the IR Test Framework but I didn't found it to be a best fit for this change. I also wanted to have a test which woks in both, product and debug builds. So I have instead extended the Whitebox API to expose the decompile, deopt and trap counters. I think (and hope) this functionality will be helpful for others in the future. The test itself got quite elaborate which is partially because different built-in exceptions are currently profiled and compiled differently (see [JDK-8275908: Record null_check traps for calls and array_check traps in the interpreter](https://bugs.openjdk.java.net/browse/JDK-8275908)). The current jtreg test can also serve as a test for JDK-8275908 once it will be fixed (just have to set the `JDK8275908_fixed` field to `true`). As I've mentioned before, I did run a full set of jtreg and JCK tests together with some benchmark suits with a special build with `-XX:-OmitStackTraceInFastThrow` disabled by default and couldn't find any issue. Please take a look, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From stuefe at openjdk.java.net Thu Nov 4 13:33:28 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 4 Nov 2021 13:33:28 GMT Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 Message-ID: `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore. This patch removes the display of reflection targets from these commands as well as associated helper code and tests. I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal. The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there. Tests: GHAs, manually testing the commands. ------------- Commit messages: - Remove reflection invocation target printing from VM.metaspace, VM.classloaders, VM.class_hierarchy Changes: https://git.openjdk.java.net/jdk/pull/6257/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6257&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272065 Stats: 368 lines in 8 files changed: 0 ins; 367 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6257.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6257/head:pull/6257 PR: https://git.openjdk.java.net/jdk/pull/6257 From psandoz at openjdk.java.net Thu Nov 4 15:56:46 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 4 Nov 2021 15:56:46 GMT Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator) [v8] In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> Message-ID: > This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE. > > On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend. > > Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work. > > No API enhancements were required and only a few additional tests were needed. Paul Sandoz has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #2 from nsjian/vector-conversion-fix AArch64: Incorrect SVE double to int and float to long vector conversion - Incorrect double to int and float to long vector conversion Like JDK-8276151, SVE vector double to int and float to long conversions have similar issue. According to Java language specification [1], we should convert double/float to integer/long directly, instead of converting to long/int and then narrowing/extending to target types. Test cases will be updated in JDK-8276151. [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5873/files - new: https://git.openjdk.java.net/jdk/pull/5873/files/c9a77225..571e6f39 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=06-07 Stats: 40 lines in 2 files changed: 22 ins; 4 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5873.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5873/head:pull/5873 PR: https://git.openjdk.java.net/jdk/pull/5873 From simonis at openjdk.java.net Thu Nov 4 16:02:48 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 16:02:48 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v5] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix build issue for minimal/zero build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/8043f8d0..bdf37bf2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=03-04 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From kvn at openjdk.java.net Thu Nov 4 16:16:36 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 16:16:36 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v2] In-Reply-To: References: Message-ID: > Currently we pass several compilation options as separate arguments to `Compile`: > > Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); > > Originally we had only `subsume_loads` option but we added few since then and we may add more. > > I suggest to add new `Options` class to pass these values into `Compile`. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6237/files - new: https://git.openjdk.java.net/jdk/pull/6237/files/34f29c8d..d2490eb4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=00-01 Stats: 12 lines in 2 files changed: 8 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6237.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6237/head:pull/6237 PR: https://git.openjdk.java.net/jdk/pull/6237 From kvn at openjdk.java.net Thu Nov 4 16:16:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 16:16:38 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 09:29:44 GMT, Aleksey Shipilev wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/compile.cpp line 490: > >> 488: #ifndef PRODUCT >> 489: // Check if recompiling >> 490: if ((subsume_loads() == false) && PrintOpto) { > > Suggestion: > > if (!subsume_loads() && PrintOpto) { done ------------- PR: https://git.openjdk.java.net/jdk/pull/6237 From shade at openjdk.java.net Thu Nov 4 16:21:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 4 Nov 2021 16:21:10 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 16:16:36 GMT, Vladimir Kozlov wrote: >> Currently we pass several compilation options as separate arguments to `Compile`: >> >> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); >> >> Originally we had only `subsume_loads` option but we added few since then and we may add more. >> >> I suggest to add new `Options` class to pass these values into `Compile`. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments All right, this works too. Next user of `Options` would probably have to introduce per-use factory methods to disambiguate constructors, so maybe we could do this early on. class Options { static Options for_runtime_stub_gen() const { return Options( /* subsume_loads = */ true, /* do_escape_analysis = */ false, /* eliminate_boxing = */ false, /* do_lock_coarsening = */ false, /* install_code = */ true ); } } ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6237 From kvn at openjdk.java.net Thu Nov 4 16:25:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 16:25:11 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v2] In-Reply-To: References: Message-ID: <-H2T_dh5-4rkYwmEZRdw9dIkyyBsLIVdeqbbGLgoq_s=.1eda85b6-8dbf-4373-bd15-f07b767c0665@github.com> On Thu, 4 Nov 2021 16:16:36 GMT, Vladimir Kozlov wrote: >> Currently we pass several compilation options as separate arguments to `Compile`: >> >> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); >> >> Originally we had only `subsume_loads` option but we added few since then and we may add more. >> >> I suggest to add new `Options` class to pass these values into `Compile`. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thank you, Aleksey, for review. ------------- PR: https://git.openjdk.java.net/jdk/pull/6237 From simonis at openjdk.java.net Thu Nov 4 16:28:52 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 4 Nov 2021 16:28:52 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: Message-ID: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/bdf37bf2..99db7e54 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=04-05 Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From kvn at openjdk.java.net Thu Nov 4 16:39:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 16:39:47 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v2] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 16:17:44 GMT, Aleksey Shipilev wrote: > All right, this works too. Next user of `Options` would probably have to introduce per-use factory methods to disambiguate constructors, so maybe we could do this early on. I agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/6237 From kvn at openjdk.java.net Thu Nov 4 16:39:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 4 Nov 2021 16:39:45 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v3] In-Reply-To: References: Message-ID: > Currently we pass several compilation options as separate arguments to `Compile`: > > Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); > > Originally we had only `subsume_loads` option but we added few since then and we may add more. > > I suggest to add new `Options` class to pass these values into `Compile`. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Per-use Options factory method ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6237/files - new: https://git.openjdk.java.net/jdk/pull/6237/files/d2490eb4..4565547e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6237&range=01-02 Stats: 9 lines in 2 files changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6237.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6237/head:pull/6237 PR: https://git.openjdk.java.net/jdk/pull/6237 From mchung at openjdk.java.net Thu Nov 4 17:05:22 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 4 Nov 2021 17:05:22 GMT Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe wrote: > `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore. > > This patch removes the display of reflection targets from these commands as well as associated helper code and tests. > > I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal. > > The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there. > > Tests: GHAs, manually testing the commands. Looks good to me. Thanks for following this up. The new implementation does not spin any new class loader and so I don't think jcmd needs to extend its support for the new implementation using method handles. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6257 From stuefe at openjdk.java.net Thu Nov 4 18:33:12 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 4 Nov 2021 18:33:12 GMT Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 17:02:05 GMT, Mandy Chung wrote: > Looks good to me. Thanks for following this up. The new implementation does not spin any new class loader and so I don't think jcmd needs to extend its support for the new implementation using method handles. Thank you Mandy! ------------- PR: https://git.openjdk.java.net/jdk/pull/6257 From coleenp at openjdk.java.net Thu Nov 4 19:16:09 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Nov 2021 19:16:09 GMT Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe wrote: > `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore. > > This patch removes the display of reflection targets from these commands as well as associated helper code and tests. > > I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal. > > The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there. > > Tests: GHAs, manually testing the commands. Yes, looks good to me also. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6257 From minqi at openjdk.java.net Thu Nov 4 19:26:14 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 4 Nov 2021 19:26:14 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: <3QIVN0gOFurHxsfnBAqIwJGC25AbniAUb4jizq2ffyw=.4dcd7164-72d3-45fd-b202-0b540a0d6263@github.com> On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > This is another instance of counter updates that only need atomic guarantee. LGTM. ------------- Marked as reviewed by minqi (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6065 From zgu at openjdk.java.net Thu Nov 4 19:44:21 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 4 Nov 2021 19:44:21 GMT Subject: RFR: 8275718: Relax memory constraint on exception counter updates In-Reply-To: <3QIVN0gOFurHxsfnBAqIwJGC25AbniAUb4jizq2ffyw=.4dcd7164-72d3-45fd-b202-0b540a0d6263@github.com> References: <3QIVN0gOFurHxsfnBAqIwJGC25AbniAUb4jizq2ffyw=.4dcd7164-72d3-45fd-b202-0b540a0d6263@github.com> Message-ID: <0VzHUKSj8F8J8OpE0HSgXigHmXjrKy6h5hhdsFcFZhI=.a8ed0b94-2123-46d2-8d4f-755cfb6e5f3a@github.com> On Thu, 4 Nov 2021 19:22:49 GMT, Yumin Qi wrote: >> This is another instance of counter updates that only need atomic guarantee. > > LGTM. Thanks, @yminqi ------------- PR: https://git.openjdk.java.net/jdk/pull/6065 From zgu at openjdk.java.net Thu Nov 4 19:44:21 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 4 Nov 2021 19:44:21 GMT Subject: Integrated: 8275718: Relax memory constraint on exception counter updates In-Reply-To: References: Message-ID: On Thu, 21 Oct 2021 15:16:28 GMT, Zhengyu Gu wrote: > This is another instance of counter updates that only need atomic guarantee. This pull request has now been integrated. Changeset: 2b5a32c7 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/2b5a32c73f22c69d7ccedac761af1dbb4a7f297d Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod 8275718: Relax memory constraint on exception counter updates Reviewed-by: dholmes, minqi ------------- PR: https://git.openjdk.java.net/jdk/pull/6065 From ngasson at openjdk.java.net Fri Nov 5 02:43:12 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Fri, 5 Nov 2021 02:43:12 GMT Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator) [v8] In-Reply-To: References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> Message-ID: On Thu, 4 Nov 2021 15:56:46 GMT, Paul Sandoz wrote: >> This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE. >> >> On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend. >> >> Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work. >> >> No API enhancements were required and only a few additional tests were needed. > > Paul Sandoz has updated the pull request incrementally with two additional commits since the last revision: > > - Merge pull request #2 from nsjian/vector-conversion-fix > > AArch64: Incorrect SVE double to int and float to long vector conversion > - Incorrect double to int and float to long vector conversion > > Like JDK-8276151, SVE vector double to int and float to long > conversions have similar issue. According to Java language > specification [1], we should convert double/float to > integer/long directly, instead of converting to long/int and then > narrowing/extending to target types. Test cases will be updated in > JDK-8276151. > > [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5873 From njian at openjdk.java.net Fri Nov 5 02:43:13 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 5 Nov 2021 02:43:13 GMT Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator) [v7] In-Reply-To: <4RJyhhtKPTjcJ894CoYqMYX0RdAsjRj0wwDcug9x4I8=.12d8e963-dc36-4cce-ad1b-241188dadd7b@github.com> References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> <4RJyhhtKPTjcJ894CoYqMYX0RdAsjRj0wwDcug9x4I8=.12d8e963-dc36-4cce-ad1b-241188dadd7b@github.com> Message-ID: On Wed, 3 Nov 2021 03:10:16 GMT, Ningsheng Jian wrote: > Converting from double to long and then narrow to target types did not follow JLS. I will fix it. Thanks to @fg1417 for helping to find out this issue. Fixed in the new commit. Thanks to @PaulSandoz for integrating the fix! Hi Nick @nick-arm , Could you please help to review the new commit, which fixes the same issue as JDK-8276151 for SVE? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5873 From dholmes at openjdk.java.net Fri Nov 5 05:03:09 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 5 Nov 2021 05:03:09 GMT Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe wrote: > `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore. > > This patch removes the display of reflection targets from these commands as well as associated helper code and tests. > > I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal. > > The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there. > > Tests: GHAs, manually testing the commands. I never realized we needed special handling for these classloaders so I'm glad to see this gone too. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6257 From stuefe at openjdk.java.net Fri Nov 5 05:19:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Nov 2021 05:19:15 GMT Subject: RFR: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 17:02:05 GMT, Mandy Chung wrote: >> `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore. >> >> This patch removes the display of reflection targets from these commands as well as associated helper code and tests. >> >> I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal. >> >> The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there. >> >> Tests: GHAs, manually testing the commands. > > Looks good to me. Thanks for following this up. The new implementation does not spin any new class loader and so I don't think jcmd needs to extend its support for the new implementation using method handles. Thanks @mlchung, @coleenp and @dholmes-ora. ------------- PR: https://git.openjdk.java.net/jdk/pull/6257 From stuefe at openjdk.java.net Fri Nov 5 05:19:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Nov 2021 05:19:15 GMT Subject: Integrated: JDK-8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 In-Reply-To: References: Message-ID: <7mVhoO4DH6O8bRsa1YBfnlRFAkY3UXeR1v_wPZzgsF0=.7d26c042-2d77-428a-bfb5-dd9b25e4aaa6@github.com> On Thu, 4 Nov 2021 13:25:14 GMT, Thomas Stuefe wrote: > `VM.metaspace`, `VM.classloaders` and `VM.class_hierarchy` all print out reflection invocation targets for delegating reflection class loaders. Post JEP 416 we don't use DelegatingClassLoaders anymore. > > This patch removes the display of reflection targets from these commands as well as associated helper code and tests. > > I don't have enough time atm to reimplement this feature using method handles. But at least we can remove the old code, and prepare the way for more code removal. > > The patch does not touch vmClasses, `reflect_ConstructorAccessor` and `reflect_MethodAccessor` are both still there. > > Tests: GHAs, manually testing the commands. This pull request has now been integrated. Changeset: 7281861e Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/7281861e0662e6c51507066a1f12673a236c7491 Stats: 368 lines in 8 files changed: 0 ins; 367 del; 1 mod 8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416 Reviewed-by: mchung, coleenp, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6257 From qingfeng.yy at alibaba-inc.com Fri Nov 5 08:34:25 2021 From: qingfeng.yy at alibaba-inc.com (Yi Yang) Date: Fri, 05 Nov 2021 16:34:25 +0800 Subject: =?UTF-8?B?UmU6IFtFeHRlcm5hbF0gOiBSZTogUkZDOiBFeHRlbmQgRENtZChEaWFnbm9zdGljLUNvbW1h?= =?UTF-8?B?bmQpIGZyYW1ld29yayB0byBzdXBwb3J0IEphdmEgbGV2ZWwgRENtZA==?= In-Reply-To: References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com> <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com> <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com>, Message-ID: Hi all, I had an offline discussion about this with Denghui, when I first time hear this idea, I felt it was useful. It allows users to do some stuff that requires a lot of effort in a simple way. I'm also tracking discussion on the mailing list, I've seen many folks come up with very constructive comments and questions/concerns. In order to make the follow-up discussion simple, I want to try to summarize and give some answers on behalf of myself. Each headline is a question/concern that folks are concerned about, followed by my personal opinion on it. I'd appreciate it if you can append any missing content. === What is it? It provides the ability for users to trigger predefined callbacks while the application is running. === May misuse? It is provided through jcmd, this ability should ideally be used for debugging/development/diagnosis purposes. It may be misused, but this is beyond our control, just as users can use signal handler to download App and play a song. === Maintainability? It expands current jcmd implementation rather than a significant modification, so maintainability should be ok IMHO. === Safety? Undeniably, it may raise some potential security issues. === Alternatives? Socket: It is inconvenient for users to simply do the same thing compared to this, we have to write a lot of boilerplate socket code. Signal: Not open to users, a limited number of signals, more likely to be misused. === Purpose? 1. I have a web application that can analyze Java heap dump. I hope to provide a simple way to report runtime app metrics, such as disk usage and online worker load, instead of writing a complete web page and providing an admin page to access it. This information can also be gathered on other monitoring platforms.2. Trigger the DEBUG functionality while running, output some debug logs Best regards. ------------------------------------------------------------------ From:Chris Plummer Send Time:2021 Nov. 4 (Thu.) 14:10 To:dong denghui ; serviceability-dev ; hotspot-dev Subject:Re: [External] : Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd Hi Denghui, Yes, there are other ways the same thing could be accomplished like sockets or signals, but all of this is outside of the purview of the JDK, and therefore we don't become responsible for its design, maintenance, and potential security concerns. EnableUserLevelDCmd doesn't really fix any of these concerns, because an app can just always launch with this flag enabled. It really should be reserved for launching a JVM for the specific purpose of gathering some extra diagnostic data, but there is no way to enforce that. Anyway, I'm not the gatekeeper on this. Just expressing some of my concerns. Others have done the same. I think we've seen a lack of enthusiasm in favor of doing this except from you. I would be good to see input from others that would like this feature in place. cheers, Chris On 11/1/21 8:09 PM, Denghui Dong wrote: Hi Chris, Thank you for the comments. Yes, we have no good way to restrict the user registration commands to only include diagnosis-related operations, but in my opinion, this does not seem to be a problem that must be solved perfectly. The following are my thoughts. This extension is an entry that triggers the operation that the user wants to perform (similar to the Signal Handler mechanism but with a name and parameters). Even without this extension, the user can have other ways to achieve the same goal. On the one hand, we could standardize the usage scenarios of the API on the document(Indeed, users can still write programs not in accordance with the specifications, for example, users can implement multiple calls to the same object's hachCode method to return different values or make an object alive again during finalize method executing). On the other hand, we can add some restrictions to help users make better use of this extension. e.g we can add a new VM option, such as EnableUserLevelDCmd, the application can only register customer commands when this option is enabled. Or from another perspective, can we allow users to do some non-diagnostic-related operations in custom commands? Best, Denghui ------------------------------------------------------------------ From:Chris Plummer Send Time:2021?11?2?(???) 03:35 To:???(??) ; serviceability-dev ; hotspot-dev Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd I have similar concerns to those others have expressed, so I'll try to add something new to the discussion and not just repeat. DCMDs have historically been very VM centric. That's not to say they aren't useful for debugging applications, but they do so by providing VM related info like stack traces, heap dumps, and class histograms. Also hotspot has been the gatekeeper for new DCMDs, meaning that new ones do not get added without going through the hotspot review process. Allowing any application or framework to add a DCMD changes this VM centric view in a way that concerns me. This approach allows a DCMD to pretty much do anything (java security not withstanding). App writers could even use them to provide a user facing interface. For example, if an app has some sort internal database, it could allow users to query it via a DCMD, and maybe even suggest that users write simple shell scripts that use jcmd to do these queries. Allowing this type of non-diagnostic usage seems like a path we don't want to go down, yet I don't see how it can be prevented once you allow applications to add DCMDs. Chris On 10/25/21 1:37 AM, Denghui Dong wrote: Hi there! We'd like to discuss a proposal for extending the current DCmd framework to support Java level DCmd. At present, DCmd only allows the VM to register commands, which can be called through jcmd or JMX. It would be beneficial if the user could create their own commands. The idea of this extension originally came from our internal Java agent that detects the misusage of Unsafe API. This agent can collect the call sites that allocate or free direct memory in the application(NMT could not do it IMO) to detect direct memory leaks. In the beginning, it just prints all call sites, without any statistical function, it's hard to use. So we plan to use a way similar to jeprof (from jemalloc) to generate a report file that aggregates all useful information. During the implementation process, we found that we need a mechanism to notify the agent to generate reports. The common practice is: a) Register a service port, triggered by an HTTP request b) Triggered by signal c) Generate reports periodically, or when the process exits But these three ways have certain problems. For a) we need to introduce a network component, will increase the complexity of implementation For b) we cannot pass parameters For c) some files that may never be used will be generated Essentially, this question is how to notify the application to do a certain task, or in other words, how do we issue a command to the application. We believe that other Java developers will also encounter similar problems. (And sometimes there may be multiple unrelated dependent components in a Java application that require such a mechanism.) Naturally, we think that jcmd can already issue some commands registered in VM to the application, why can't we extend to the java level? This feature will be very useful for some lightweight tools, just like the scenario we encountered, to notify the tools to perform certain operations. In addition, this feature will also bring benefits to Java beginners. For example, in the beginning, beginners may not use advanced log components, but they will also encounter the need to output debug logs. They may write code like this: ``` if (debug) { System.out.println("..."); } ``` If developers can easily control the value of debug, it's attractive. Like this: ``` Factory.register("MyApp.flipDebug", out -> debug = !debug); jcmd MyApp.flipDebug ``` For mainstream framework, we can apply this feature to trigger some common activities, such as health checks, graceful shutdown, and dynamic configuration updates, But to be honest, these frameworks are very mature and stable, and for compatibility purposes, it's hard to let them use this extension. Comments welcome! Thanks, Denghui From chagedorn at openjdk.java.net Fri Nov 5 09:16:12 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 5 Nov 2021 09:16:12 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v3] In-Reply-To: References: Message-ID: On Thu, 4 Nov 2021 16:39:45 GMT, Vladimir Kozlov wrote: >> Currently we pass several compilation options as separate arguments to `Compile`: >> >> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); >> >> Originally we had only `subsume_loads` option but we added few since then and we may add more. >> >> I suggest to add new `Options` class to pass these values into `Compile`. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Per-use Options factory method Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6237 From lkorinth at openjdk.java.net Fri Nov 5 09:30:15 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 5 Nov 2021 09:30:15 GMT Subject: RFR: 8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded [v2] In-Reply-To: References: Message-ID: On Fri, 29 Oct 2021 13:29:40 GMT, Leo Korinth wrote: >> In allocation.hpp, the name allocated_on_stack can be misleading, better rename the function to allocated_on_stack_or_embedded and it will match the name of the enum as a bonus. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > restart failed github tests Thanks Thomas. ------------- PR: https://git.openjdk.java.net/jdk/pull/6004 From lkorinth at openjdk.java.net Fri Nov 5 09:30:15 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 5 Nov 2021 09:30:15 GMT Subject: Integrated: 8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded In-Reply-To: References: Message-ID: On Tue, 19 Oct 2021 12:18:30 GMT, Leo Korinth wrote: > In allocation.hpp, the name allocated_on_stack can be misleading, better rename the function to allocated_on_stack_or_embedded and it will match the name of the enum as a bonus. This pull request has now been integrated. Changeset: 323d2017 Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/323d2017959dc96d25eaa1aad6404586099c237e Stats: 16 lines in 6 files changed: 0 ins; 0 del; 16 mod 8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded Reviewed-by: stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/6004 From shade at openjdk.java.net Fri Nov 5 10:13:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 5 Nov 2021 10:13:08 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace In-Reply-To: References: Message-ID: On Thu, 7 Oct 2021 12:42:48 GMT, Aleksey Shipilev wrote: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` Anyone has opinions about this patch? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From mcimadamore at openjdk.java.net Fri Nov 5 11:06:53 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 11:06:53 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v18] In-Reply-To: References: Message-ID: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com> > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: * Add two new CLinker static methods to compute upcall/downcall method types * Clarify section on CLinker downcall type * Add section on CLinker safety guarantees ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/b9432473..ce561e1f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=17 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=16-17 Stats: 79 lines in 3 files changed: 47 ins; 17 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Fri Nov 5 11:30:59 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 11:30:59 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v19] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Streamline javadoc for package-info ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/ce561e1f..350f1f07 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=18 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=17-18 Stats: 37 lines in 1 file changed: 9 ins; 3 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Fri Nov 5 11:37:57 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 11:37:57 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v20] In-Reply-To: References: Message-ID: <1EDavlhSqnzIbpu1uQArxPknmjIMaeQEoPV8W1T3UjE=.9a5dcd88-0cc7-4965-b2b7-3cccaf70b50e@github.com> > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress (which is consistent with other restricted factories in VaList and NativeSymbol) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/350f1f07..663e72a8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=19 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=18-19 Stats: 51 lines in 23 files changed: 0 ins; 3 del; 48 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Fri Nov 5 11:54:14 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 11:54:14 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v12] In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 15:40:45 GMT, Paul Sandoz wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak javadoc of loaderLookup > > Marked as reviewed by psandoz (Reviewer). I have made some minor API changes (added two methods to `CLinker` to return the upcall and downcall method types, as suggested offline by @PaulSandoz). I've also cleaned up the `CLinker` javadoc, and added a section on safety consideration, streamlined the links in the package-level javadoc and renamed `MemorySegment::ofAddressNative` to simply `MemorySegment::ofAddress` (which is consistent with restricted factories in `NativeSymbol` and `VaList`). javadoc: http://cr.openjdk.java.net/~mcimadamore/JEP-419/v2/javadoc/jdk/incubator/foreign/package-summary.html specdiff: http://cr.openjdk.java.net/~mcimadamore/JEP-419/v2/specdiff_out/overview-summary.html ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From jvernee at openjdk.java.net Fri Nov 5 14:29:19 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Fri, 5 Nov 2021 14:29:19 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v18] In-Reply-To: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com> References: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com> Message-ID: On Fri, 5 Nov 2021 11:06:53 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > * Add two new CLinker static methods to compute upcall/downcall method types > * Clarify section on CLinker downcall type > * Add section on CLinker safety guarantees src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 65: > 63: *

  • if {@code L} is a {@link ValueLayout} with carrier {@code E} then there are two cases: > 64: *
      > 65: *
    • if {@code L} occurs in a parameter position and {@code E} is {@code NativeAddress.class}, This looks spurious src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 134: > 132: *

      > 133: * Upcall stubs are generally safer to work with, as the linker runtime can validate the type of the target method > 134: * handle against the provided function descriptor and report an error if any mismatch is detected. If the target method But, in the case of upcalls, errors can still occur if the native code casts the pointer to the upcall stub to an incorrect type, e.g. `FunctionDescriptor.ofVoid(ADDRESS, ADDRESS)`, but on the native side cast it to `void (*)(void*)`, meaning the second argument would be garbage on the Java side. i.e. there is still room for a mismatch the same as with downcalls. src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 267: > 265: static MethodType upcallType(FunctionDescriptor functionDescriptor) { > 266: return SharedUtils.inferMethodType(functionDescriptor, true); > 267: } Nice! :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Fri Nov 5 14:37:23 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 14:37:23 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v18] In-Reply-To: References: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com> Message-ID: On Fri, 5 Nov 2021 14:25:35 GMT, Jorn Vernee wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> * Add two new CLinker static methods to compute upcall/downcall method types >> * Clarify section on CLinker downcall type >> * Add section on CLinker safety guarantees > > src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 134: > >> 132: *

      >> 133: * Upcall stubs are generally safer to work with, as the linker runtime can validate the type of the target method >> 134: * handle against the provided function descriptor and report an error if any mismatch is detected. If the target method > > But, in the case of upcalls, errors can still occur if the native code casts the pointer to the upcall stub to an incorrect type, e.g. `FunctionDescriptor.ofVoid(ADDRESS, ADDRESS)`, but on the native side cast it to `void (*)(void*)`, meaning the second argument would be garbage on the Java side. i.e. there is still room for a mismatch the same as with downcalls. Yes and no. In a downcall, you just don't know what signature the downcall will feature in the native lib. So you pass a function descriptor and you hope it's ok. In the upcall case you _do_ know the signature of the Java upcall code you want to call, so you can validate the descriptor against that. Of course the native code can still cast things around in ways that blow things up, but the two problems seem somewhat different, at least to me. But I can tweak the text a bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Fri Nov 5 15:28:45 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 15:28:45 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v21] In-Reply-To: References: Message-ID: <_YLlQk23TfRkCzouXvgHH3Zxktw1sxo1uvae5KsjlFw=.3c4f3aeb-e24f-424b-94f4-04b19f0e834b@github.com> > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Clarify safety considerations for upcalls ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/663e72a8..2aa126a9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=20 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=19-20 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From jvernee at openjdk.java.net Fri Nov 5 15:52:16 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Fri, 5 Nov 2021 15:52:16 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v18] In-Reply-To: References: <4iHYQMJoHZwfRJCHV9tYB_5t92pjEsgISw_d9_Nt6H8=.1fb75c40-a6ee-498a-9d4e-cf1b6d11b583@github.com> Message-ID: On Fri, 5 Nov 2021 14:33:44 GMT, Maurizio Cimadamore wrote: >> src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java line 134: >> >>> 132: *

      >>> 133: * Upcall stubs are generally safer to work with, as the linker runtime can validate the type of the target method >>> 134: * handle against the provided function descriptor and report an error if any mismatch is detected. If the target method >> >> But, in the case of upcalls, errors can still occur if the native code casts the pointer to the upcall stub to an incorrect type, e.g. `FunctionDescriptor.ofVoid(ADDRESS, ADDRESS)`, but on the native side cast it to `void (*)(void*)`, meaning the second argument would be garbage on the Java side. i.e. there is still room for a mismatch the same as with downcalls. > > Yes and no. In a downcall, you just don't know what signature the downcall will feature in the native lib. So you pass a function descriptor and you hope it's ok. In the upcall case you _do_ know the signature of the Java upcall code you want to call, so you can validate the descriptor against that. Of course the native code can still cast things around in ways that blow things up, but the two problems seem somewhat different, at least to me. But I can tweak the text a bit. Ok, thanks. I think of it more like this: in both cases we specify a native type as well as a Java type, both in the form of a FunctionDescriptor, from which we then derive the Java type in the form of a MethodType. If there is a mismatch here with what the native code does we are in trouble, this seems the same for downcalls and upcalls. In both cases we know the Java side for sure, it's the native side we can't validate (they are just flipped around for upcalls). But, for upcalls there is an additional thing that can go wrong: the type of the target MethodHandle we pass could have a mismatch with the type we inferred from the FunctionDescriptor, so there we need to do an extra check. i.e. in a way this seems _less_ safe (though a different kind of safety), than downcalls, since there is an additional way to mess up with the linkage request, although we can catch that case. ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Fri Nov 5 16:02:43 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 5 Nov 2021 16:02:43 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v22] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Further tweak upcall safety considerations ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/2aa126a9..4e3af9f1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=21 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=20-21 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From kvn at openjdk.java.net Fri Nov 5 16:11:18 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 16:11:18 GMT Subject: RFR: 8276571: C2: pass compilation options as structure [v3] In-Reply-To: References: Message-ID: <9Vo2Sucm_eCSVFaAB_G7_KRKdoq4z3y2BmLhxT8rREk=.14cf2e9a-b74c-4f7d-9d7d-6075f4e6cb96@github.com> On Thu, 4 Nov 2021 16:39:45 GMT, Vladimir Kozlov wrote: >> Currently we pass several compilation options as separate arguments to `Compile`: >> >> Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); >> >> Originally we had only `subsume_loads` option but we added few since then and we may add more. >> >> I suggest to add new `Options` class to pass these values into `Compile`. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Per-use Options factory method Thank you, Aleksey and Christian. ------------- PR: https://git.openjdk.java.net/jdk/pull/6237 From kvn at openjdk.java.net Fri Nov 5 16:11:18 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 5 Nov 2021 16:11:18 GMT Subject: Integrated: 8276571: C2: pass compilation options as structure In-Reply-To: References: Message-ID: On Wed, 3 Nov 2021 18:49:47 GMT, Vladimir Kozlov wrote: > Currently we pass several compilation options as separate arguments to `Compile`: > > Compile C(env, target, entry_bci, subsume_loads, do_escape_analysis, eliminate_boxing, do_locks_coarsening, install_code, directive); > > Originally we had only `subsume_loads` option but we added few since then and we may add more. > > I suggest to add new `Options` class to pass these values into `Compile`. This pull request has now been integrated. Changeset: a74a839a Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/a74a839af02446d322d77c6e546e652ec6ad5d73 Stats: 76 lines in 4 files changed: 40 ins; 17 del; 19 mod 8276571: C2: pass compilation options as structure Reviewed-by: shade, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/6237 From stuefe at openjdk.java.net Sat Nov 6 05:49:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 6 Nov 2021 05:49:45 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 15:49:05 GMT, Thomas Stuefe wrote: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. For the whole story please refer to https://bugs.openjdk.java.net/browse/JDK-8275301. > > This proposal adds NMT buffer overflow checking. As laid out in JDK-8275301: > > - it would give us C-heap overflow checking in release builds > - the additional costs are neglectable > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. The error reports would also be confusing. > - it is a preparation for future code removal (the memory guarding done in debug only in os::malloc() and friends, and possibly the guarding done with CheckJNICalls) > > Patch notes: > > 1) The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. > > On 64-bit, we don't even need to enlarge the malloc header: we carve some bits out by decreasing the size of the bucket index bit field to 16 bits. The bucket index field is used to store the bucket slot of the malloc site table in NMT detail mode. The malloc site table width is 512 atm, so 65k gives plenty of room for growing the malloc site table should we ever want to. > > On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes. That is because there were not enough bits to spare for a canary. On the upside, 8 bytes were not enough anyway, strictly speaking, to guarantee proper alignment e.g. for 128bit data types on all 32-bit platforms. See e.g. the malloc alignment the glibc uses. > > I also took the freedom of re-arranging the malloc header fields a bit to minimize the difference between 32-bit and 64-bit platforms, and to align each field optimally according to its size. I also switched from bitfields to real types in order to be able to do a sizeof() on them. > > For more details, see the comment in mallocTracker.hpp. > > 2) I added a footer canary trailing the user allocation to catch tail buffer overruns. For simplicity reasons (alignment) and to save some cycles I made it a byte only. That is enough to catch most overrun scenarios. If you think this is too small, I'm open to change it. > > 3) I put a bit of work into error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > 4) I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > (Note that these gtests, to test anything, need to run with NMT switched on. We do this as part of our NMT jtreg-controlled gtests in tier1). > > Even though the patch adds more code than it removes, it prepares possible code removal (if we can agree to do that) and the net result will be less complexity, not more. Again, see JDK-8275301 for details. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 14 days in a row without problems No takers? ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From iklam at openjdk.java.net Sun Nov 7 21:23:52 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sun, 7 Nov 2021 21:23:52 GMT Subject: RFR: 8269986: Remove +3 from Symbol::identity_hash() Message-ID: Please review this change that removes the `+3` from here: unsigned Symbol::identity_hash() const { unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3)); ^^^ return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) | ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16); } The `+3` was intended to avoid getting the same value for these bits: ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07) However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive. Testing: Oracle CI tiers 1-4 ------------- Commit messages: - 8269986: Remove +3 from Symbol::identity_hash() Changes: https://git.openjdk.java.net/jdk/pull/6287/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6287&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269986 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6287.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6287/head:pull/6287 PR: https://git.openjdk.java.net/jdk/pull/6287 From duke at openjdk.java.net Mon Nov 8 00:30:39 2021 From: duke at openjdk.java.net (Joshua Cao) Date: Mon, 8 Nov 2021 00:30:39 GMT Subject: RFR: 8274860: gcc 10.2.1 produces an uninitialized warning in sharedRuntimeTrig.cpp In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 23:39:48 GMT, Joshua Cao wrote: > Initialize `fq` to an array to zeroes. I've taken a look at the discussion on the JBS issue again. I'm not sure why it was determined that this change should be applied to tip. I've tried to build locally for JDK tip and JDK15, and there is no uninitialized warning. There is a link in the JBS description explaining where the warning is disabled. I think this issue should be closed, and I'll update https://github.com/openjdk/jdk11u-dev/pull/489. ------------- PR: https://git.openjdk.java.net/jdk/pull/6220 From duke at openjdk.java.net Mon Nov 8 00:30:39 2021 From: duke at openjdk.java.net (Joshua Cao) Date: Mon, 8 Nov 2021 00:30:39 GMT Subject: Withdrawn: 8274860: gcc 10.2.1 produces an uninitialized warning in sharedRuntimeTrig.cpp In-Reply-To: References: Message-ID: On Tue, 2 Nov 2021 23:39:48 GMT, Joshua Cao wrote: > Initialize `fq` to an array to zeroes. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6220 From ddong at openjdk.java.net Mon Nov 8 01:26:37 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 8 Nov 2021 01:26:37 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v2] In-Reply-To: References: Message-ID: On Sun, 31 Oct 2021 22:56:44 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix build problem Gentle ping? This problem seems to have existed for a long time. I think it's because there are few users, so it's not reported As far as I know, a BCC's tool relies on this probe. https://github.com/iovisor/bcc/blob/master/tools/lib/uobjnew.py#L110 ------------- PR: https://git.openjdk.java.net/jdk/pull/6181 From dholmes at openjdk.java.net Mon Nov 8 01:54:34 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 8 Nov 2021 01:54:34 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v2] In-Reply-To: References: Message-ID: On Sun, 31 Oct 2021 22:56:44 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix build problem To me something like `dtrace_object_alloc_base` should not be called directly (like a foo_impl function) but only as the implementation of the real API entry points. If that isn't the case here then lets drop the "base" part and just have a set of overloaded `dtrace_object_alloc` functions. Thanks, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6181 From ddong at openjdk.java.net Mon Nov 8 02:42:58 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 8 Nov 2021 02:42:58 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v3] In-Reply-To: References: Message-ID: > Hi, > > Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). > > JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). > > To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update according to comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6181/files - new: https://git.openjdk.java.net/jdk/pull/6181/files/8d597ebc..0527097e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6181&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6181&range=01-02 Stats: 17 lines in 13 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/6181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6181/head:pull/6181 PR: https://git.openjdk.java.net/jdk/pull/6181 From ddong at openjdk.java.net Mon Nov 8 02:42:58 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 8 Nov 2021 02:42:58 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v2] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 01:51:30 GMT, David Holmes wrote: > To me something like `dtrace_object_alloc_base` should not be called directly (like a foo_impl function) but only as the implementation of the real API entry points. If that isn't the case here then lets drop the "base" part and just have a set of overloaded `dtrace_object_alloc` functions. > > Thanks, David Changed. Thanks, Denghui ------------- PR: https://git.openjdk.java.net/jdk/pull/6181 From dholmes at openjdk.java.net Mon Nov 8 05:23:38 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 8 Nov 2021 05:23:38 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v3] In-Reply-To: References: Message-ID: <2selTWcClSy4aHruUOuUlErZUVny8-VgrladwcRJVm4=.830e17bf-d507-452e-a37f-f6c819666955@github.com> On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update according to comments Did your change actually work? I just realized that you can't use this: ```__ call(RuntimeAddress(CAST_FROM_FN_PTR(address, static_cast(SharedRuntime::dtrace_object_alloc))));``` because it has no idea what overload of `dtrace_object_alloc` needs to be invoked. David ------------- PR: https://git.openjdk.java.net/jdk/pull/6181 From dholmes at openjdk.java.net Mon Nov 8 05:36:33 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 8 Nov 2021 05:36:33 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v3] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update according to comments Sorry ignore that. I see that is what the static_cast is intended to do. I got confused by the need to change the additional call-sites that already used `dtrace_object_alloc`, because they were the ones previously calling the two-arg function but only passing one arg! ------------- PR: https://git.openjdk.java.net/jdk/pull/6181 From dholmes at openjdk.java.net Mon Nov 8 05:52:34 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 8 Nov 2021 05:52:34 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v3] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update according to comments I understand now what you meant by the additional overload complicating the fix - I hadn't appreciated that may have been the reason for using different names for the functions originally. I'm still unclear why the lack of the size argument did not cause problems? I guess whatever random value was next on the stack got read as the size, but reading it caused no harm it was just incorrect. These changes look good to me now. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6181 From tschatzl at openjdk.java.net Mon Nov 8 09:51:36 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 8 Nov 2021 09:51:36 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015, specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > MODE_ARGS="-ikv" Hi, we tried to reproduce your numbers internally, but failed to do so. Differences seem to be within noise. We tried with specjbb2015 multi-jvm on a fairly large machine (152 threads; it was what has been "on hand") and multiple runs of our internal benchmarks and specjbb2015 composite runs on various (not-so-large but still fairly big sized machines). Could you post or send more details about your configuration? The other concern that has been brought up internally has been that this increases the size of Thread by ~40% from 624 to 872 bytes; do you think there a way to save some memory by reorganizing the fields so that the counter is on a separate cache line "naturally"? Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From ddong at openjdk.java.net Mon Nov 8 12:11:33 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 8 Nov 2021 12:11:33 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v3] In-Reply-To: References: Message-ID: On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update according to comments Thank you, David. Could I have another review? ------------- PR: https://git.openjdk.java.net/jdk/pull/6181 From mli at openjdk.java.net Mon Nov 8 12:36:36 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 8 Nov 2021 12:36:36 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: <89OJo1V3vrvkTQ-dU-97B54AVaFV7eBlBe_vp6oXRmU=.f3918f6a-290c-4c85-b25b-3eef082a82fc@github.com> On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015. > > > > ========= test result (1st round) ========== > rcu base > 45096 38980 > 41741 41468 > 42349 41053 > 44485 42030 > 47103 39915 > 43864 36004 > > ==== average ==== > 44106.33333 39908.33333 > > ==== improvement ==== > 10.5% > > ========= test result (2nd round) ========== > Second round of run includes 3 types: > 1. pad gc data & pad rcu > 2. pad rcu only > 3. base > > Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement. > > gc data + rcu rcu base > 41284 41860 37099 > 42296 42166 44692 > 42810 43423 41801 > 43492 45603 40274 > 43808 40641 39627 > 43029 40242 39793 > 42543 41662 41544 > 43420 42702 37991 > 44212 43354 40319 > 42692 43442 45264 > 44773 44577 44213 > 40835 41870 42008 > 44282 44167 42527 > > ==== average ==== > 43036.61538 42746.84615 41319.38462 > > ==== improvement ==== > gc data + rcu / base: 4.156% > rcu / base: 3.45% > > > > > ========= configuration and environment ========== > specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > > SPEC_OPTS_C="-Dspecjbb.group.count=$GROUP_COUNT -Dspecjbb.txi.pergroup.count=$TI_JVM_COUNT" > SPEC_OPTS_TI="" > SPEC_OPTS_BE="" > > JAVA_OPTS_C="-server -Xms2g -Xmx2g -XX:+UseParallelGC" > JAVA_OPTS_TI="-server -Xms2g -Xmx2g -XX:+UseParallelGC" > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > > MODE_ARGS_C="-ikv" > MODE_ARGS_TI="-ikv" > MODE_ARGS_BE="-ikv" > > NUM_OF_RUNS=1 > > HW: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 224 > On-line CPU(s) list: 0-223 > Thread(s) per core: 2 > Core(s) per socket: 28 > Socket(s): 4 > NUMA node(s): 4 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz > Stepping: 4 > CPU MHz: 1001.925 > CPU max MHz: 2101.0000 > CPU min MHz: 1000.0000 > BogoMIPS: 4200.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 39424K > NUMA node0 CPU(s): 0-27,112-139 > NUMA node1 CPU(s): 28-55,140-167 > NUMA node2 CPU(s): 56-83,168-195 > NUMA node3 CPU(s): 84-111,196-223 > > total used free shared buff/cache available > Mem: 3.0T 3.8G 2.9T 18M 25G 2.9T > Swap: 99G 0B 99G Thanks Thomas for the feedback. I have updated the summary of this PR with more configuration and environment info, and I also updated the 2nd round of run. Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement, and seems the data is more stable than the 1st round of run. (JBS is not available currently, will update JBS too later) ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From mli at openjdk.java.net Mon Nov 8 12:41:34 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 8 Nov 2021 12:41:34 GMT Subject: RFR: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Mon, 8 Nov 2021 09:48:24 GMT, Thomas Schatzl wrote: > The other concern that has been brought up internally has been that this increases the size of Thread by ~40% from 624 to 872 bytes; do you think there a way to save some memory by reorganizing the fields so that the counter is on a separate cache line "naturally"? Sure, if current change is proven to bring some performance benefit, let me do some more research in this direction. ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From coleenp at openjdk.java.net Mon Nov 8 13:48:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 8 Nov 2021 13:48:42 GMT Subject: RFR: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) [v3] In-Reply-To: References: Message-ID: <5ddSIvC7u6q93eA32fdwCpNuOKpfE0oOpI7qcL1yZ9I=.c2a2077d-9b82-4330-a5c9-2a893996d374@github.com> On Mon, 8 Nov 2021 02:42:58 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). >> >> JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). >> >> To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update according to comments These casts are hard to look at but it seems fine. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6181 From ddong at openjdk.java.net Mon Nov 8 14:34:40 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 8 Nov 2021 14:34:40 GMT Subject: Integrated: 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) In-Reply-To: References: Message-ID: On Sun, 31 Oct 2021 15:08:11 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix that corrects the oop size value of dtrace_object_alloc(_base). > > JDK-8039904 added a new parameter 'size' to SharedRuntime::dtrace_object_alloc and dtrace_object_alloc_base, but didn't modified the callsites(interpreter/c1/c2). > > To make this fix as simple as possible, I overloaded dtrace_object_alloc_base rather than dtrace_object_alloc. > > Thanks, > Denghui This pull request has now been integrated. Changeset: c815c5cb Author: Denghui Dong URL: https://git.openjdk.java.net/jdk/commit/c815c5cbbb0b6a2aebd0a38cb930c74bd665d082 Stats: 22 lines in 13 files changed: 6 ins; 0 del; 16 mod 8276209: Some call sites doesn't pass the parameter 'size' to SharedRuntime::dtrace_object_alloc(_base) Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/6181 From duke at openjdk.java.net Tue Nov 9 10:06:42 2021 From: duke at openjdk.java.net (duke) Date: Tue, 9 Nov 2021 10:06:42 GMT Subject: Withdrawn: 8261492: Shenandoah: reconsider forwardee accesses memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:55:39 GMT, Aleksey Shipilev wrote: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. The close inspection of the code reveals we need "acquire" on many paths, but not on the most critical one: heap updates. This must explain why current weaker reader side was never seen to fail, and this also opens a way to get `acquire`-in-lieu-of-`consume` without the observable performance penalty. > > The relaxation in forwardee installation improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64: > > Before: > > > [info][gc,stats] Concurrent Evacuation = 3.421 s (a = 21247 us) (n = 161) > [info][gc,stats] Concurrent Evacuation = 3.584 s (a = 21080 us) (n = 170) > [info][gc,stats] Concurrent Evacuation = 3.226 s (a = 21088 us) (n = 153) > [info][gc,stats] Concurrent Evacuation = 3.270 s (a = 20827 us) (n = 157) > [info][gc,stats] Concurrent Evacuation = 3.339 s (a = 20742 us) (n = 161) > > > After: > > [info][gc,stats] Concurrent Evacuation = 3.109 s (a = 18617 us) (n = 167) > [info][gc,stats] Concurrent Evacuation = 3.027 s (a = 18918 us) (n = 160) > [info][gc,stats] Concurrent Evacuation = 2.862 s (a = 17669 us) (n = 162) > [info][gc,stats] Concurrent Evacuation = 2.858 s (a = 17425 us) (n = 164) > [info][gc,stats] Concurrent Evacuation = 2.883 s (a = 17685 us) (n = 163) > > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah > - [x] Linux AArch64 `tier1` with Shenandoah This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From darcy at openjdk.java.net Tue Nov 9 17:36:42 2021 From: darcy at openjdk.java.net (Joe Darcy) Date: Tue, 9 Nov 2021 17:36:42 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: References: Message-ID: <8tp4-qz24AVj-fehq34bqnlQaredE_VuoYG9-_Mq68c=.716e3172-a465-43af-a566-58c78ac72839@github.com> On Thu, 4 Nov 2021 07:26:34 GMT, Alan Bateman wrote: >> Given the 'R' in CSR already stands for Review this should have said "CSR request". >> >> But I also have no idea what the comment is actually trying to say - what is "these" referring to??? > > I don't know why that comment is there. The API is Class::getSigners and any changes to its behavior would require a CSR, but we are free to change the implementation. So maybe the comment should be removed. Filed JDK-8276889 in case further cleanup of the wording in instanceKlass.cpp is desired. ------------- PR: https://git.openjdk.java.net/jdk/pull/6240 From coleenp at openjdk.java.net Tue Nov 9 18:17:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 9 Nov 2021 18:17:42 GMT Subject: RFR: JDK-8276588: Change "ccc" to "CSR" in HotSpot sources In-Reply-To: <8tp4-qz24AVj-fehq34bqnlQaredE_VuoYG9-_Mq68c=.716e3172-a465-43af-a566-58c78ac72839@github.com> References: <8tp4-qz24AVj-fehq34bqnlQaredE_VuoYG9-_Mq68c=.716e3172-a465-43af-a566-58c78ac72839@github.com> Message-ID: <3zjndpnkWASjczngWhkP4X3a5dVGeICaKIite9uicOg=.48267e6d-293d-4b3c-8fbe-75fdc6cded58@github.com> On Tue, 9 Nov 2021 17:33:07 GMT, Joe Darcy wrote: >> I don't know why that comment is there. The API is Class::getSigners and any changes to its behavior would require a CSR, but we are free to change the implementation. So maybe the comment should be removed. > > Filed JDK-8276889 in case further cleanup of the wording in instanceKlass.cpp is desired. Oh at one point we were trying to figure out a different way of implementing signers so that it didn't have to store a field per InstanceKlass, when we were working on density. ------------- PR: https://git.openjdk.java.net/jdk/pull/6240 From coleenp at openjdk.java.net Tue Nov 9 20:04:37 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 9 Nov 2021 20:04:37 GMT Subject: RFR: 8269986: Remove +3 from Symbol::identity_hash() In-Reply-To: References: Message-ID: <3QfGGc4vIbwBz-k8URuVmp2bVWOID4UQmEwKBSQo7Ls=.61a1aca5-7b04-4017-a37a-3f82a6327e9c@github.com> On Sun, 7 Nov 2021 21:10:35 GMT, Ioi Lam wrote: > Please review this change that removes the `+3` from here: > > > unsigned Symbol::identity_hash() const { > unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3)); > ^^^ > return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) | > ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16); > } > > > The `+3` was intended to avoid getting the same value for these bits: > > > ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07) > > > However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive. > > Testing: Oracle CI tiers 1-4 Looks good! Thanks for doing the performance analysis. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6287 From duke at openjdk.java.net Wed Nov 10 01:10:45 2021 From: duke at openjdk.java.net (duke) Date: Wed, 10 Nov 2021 01:10:45 GMT Subject: Withdrawn: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From chris.plummer at oracle.com Wed Nov 10 05:50:38 2021 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 9 Nov 2021 21:50:38 -0800 Subject: [External] : Re: RFC: Extend DCmd(Diagnostic-Command) framework to support Java level DCmd In-Reply-To: References: <7f2df6ad-7d73-46ac-a23e-959fd6b4d4af.denghui.ddh@alibaba-inc.com> <6581e9e4-851a-c562-37af-915ff3fdc492@oracle.com> <8a5c6087-a131-4ddd-9195-0f1e51705351.denghui.ddh@alibaba-inc.com> Message-ID: <0d05daaa-82cb-1537-5292-03d8e1d1d625@oracle.com> Hi Denghui, Following up here with something that was discussed in the other email thread, Ioi asked if an MBean could be used to provide similar app diagnostics. It seems it can be. Erik also mentioned to me that the REST API can be used for something like this. The example he gave is a query something like the following: ?? curl http://localhost:8080?command=foo?param1=bar Also it has been pointed out that sockets could be used. I know a jcmd might be easier to use/access than any of these other 3 approaches, but we have to question if it is worth adding given all the concerns that have been pointed so far. thanks, Chris On 11/5/21 1:34 AM, Yi Yang wrote: > Hi all, > > I had an offline discussion about this with Denghui, when I first time > hear this idea, I felt it was useful. It allows users to do some stuff > that requires a lot of effort in a simple way. I'm also tracking > discussion on the mailing list, I've seen many folks come up with very > constructive comments and questions/concerns. In order to make the > follow-up discussion simple, I want to try to summarize and give some > answers on behalf of myself. Each headline is a question/concern that > folks are concerned about, followed by my personal opinion on it. I'd > appreciate it if you can append any missing content. > > === What is it? > It provides the ability for users to trigger predefined callbacks > while the application is running. > > === May misuse? > It is provided through jcmd, this ability should ideally be used for > debugging/development/diagnosis purposes. It may be misused, but this > is beyond our control, just as users can use signal handler to > download App and play a song. > > === Maintainability? > It expands current jcmd implementation rather than a significant > modification, so maintainability should be ok IMHO. > > === Safety? > Undeniably, it may raise some potential security issues. > > === Alternatives? > Socket: It is inconvenient for users to simply do the same thing > compared to this, we have to write a lot of boilerplate socket code. > Signal: Not open to users,? a limited number of signals, more likely > to be misused. > > === Purpose? > 1. I have a web application that can analyze Java heap dump. I hope to > provide a simple way to report runtime app metrics, such as disk usage > and online worker load, instead of writing a complete web page and > providing an admin page to access it. This information can also be > gathered on other monitoring platforms. > 2. Trigger the DEBUG functionality while running, output some debug logs > > Best regards. > > ------------------------------------------------------------------ > From:Chris Plummer > Send Time:2021 Nov. 4 (Thu.) 14:10 > To:dong denghui ; serviceability-dev > ; hotspot-dev > > Subject:Re: [External] : Re: RFC: Extend DCmd(Diagnostic-Command) > framework to support Java level DCmd > > Hi Denghui, > > Yes, there are other ways the same thing could be accomplished > like sockets or signals, but all of this is outside of the purview > of the JDK, and therefore we don't become responsible for its > design, maintenance, and potential security concerns. > EnableUserLevelDCmd doesn't really fix any of these concerns, > because an app can just always launch with this flag enabled. It > really should be reserved for launching a JVM for the specific > purpose of gathering some extra diagnostic data, but there is no > way to enforce that. > > Anyway, I'm not the gatekeeper on this. Just expressing some of my > concerns. Others have done the same. I think we've seen a lack of > enthusiasm in favor of doing this except from you. I would be good > to see input from others that would like this feature in place. > > cheers, > > Chris > > On 11/1/21 8:09 PM, Denghui Dong wrote: > Hi?Chris, > > Thank?you?for?the?comments. > > Yes,?we?have?no?good?way?to?restrict?the?user?registration?commands?to?only?include?diagnosis-related?operations,?but?in?my?opinion,?this?does?not?seem?to?be?a?problem?that?must?be?solved?perfectly. > > The?following?are?my?thoughts. > > This?extension?is?an?entry?that?triggers?the?operation?that?the?user?wants?to?perform?(similar?to?the?Signal?Handler?mechanism?but?with?a?name?and?parameters).?Even?without?this?extension,?the?user?can?have?other?ways?to?achieve?the?same?goal. > > On?the?one?hand,?we?could?standardize?the?usage?scenarios?of?the?API?on?the?document(Indeed,?users?can?still?write?programs?not?in?accordance?with?the?specifications,?for?example,?users?can?implement?multiple?calls?to?the?same?object's?hachCode?method?to?return?different?values?or?make?an?object?alive?again?during?finalize?method?executing). > > On?the?other?hand,?we?can?add?some?restrictions?to?help?users?make?better?use?of?this?extension. > e.g?we?can?add?a?new?VM?option,?such?as?EnableUserLevelDCmd,?the?application?can?only?register?customer?commands?when?this?option?is?enabled. > > Or?from?another?perspective,?can?we?allow?users?to?do?some?non-diagnostic-related?operations?in?custom?commands? > > Best, > Denghui > ------------------------------------------------------------------ > From:Chris Plummer > Send Time:2021?11?2?(???) 03:35 > To:???(??) ; serviceability-dev > ; hotspot-dev > > Subject:Re: RFC: Extend DCmd(Diagnostic-Command) framework to > support Java level DCmd > > I have similar concerns to those others have expressed, so I'll > try to add something new to the discussion and not just repeat. > > DCMDs have historically been very VM centric. That's not to say > they aren't useful for debugging applications, but they do so by > providing VM related info like stack traces, heap dumps, and class > histograms. Also hotspot has been the gatekeeper for new DCMDs, > meaning that new ones do not get added without going through the > hotspot review process. > > Allowing any application or framework to add a DCMD changes this > VM centric view in a way that concerns me. This approach allows a > DCMD to pretty much do anything (java security not withstanding). > App writers could even use them to provide a user facing > interface. For example, if an app has some sort internal database, > it could allow users to query it via a DCMD, and maybe even > suggest that users write simple shell scripts that use jcmd to do > these queries. Allowing this type of non-diagnostic usage seems > like a path we don't want to go down, yet I don't see how it can > be prevented once you allow applications to add DCMDs. > > Chris > > On 10/25/21 1:37 AM, Denghui Dong wrote: > Hi?there! > > We'd?like?to?discuss?a?proposal?for?extending?the?current?DCmd?framework?to?support?Java?level?DCmd. > > At?present,?DCmd?only?allows?the?VM?to?register?commands,?which?can?be?called?through?jcmd?or?JMX.?It?would?be?beneficial?if?the?user?could?create?their?own?commands. > > The?idea?of > this?extension?originally?came?from?our?internal?Java?agent?that?detects?the?misusage?of?Unsafe?API. > > This?agent?can?collect?the?call?sites?that?allocate?or?free?direct?memory?in?the?application(NMT?could?not?do?it?IMO)?to?detect?direct?memory?leaks. > > In?the?beginning,?it?just?prints?all?call?sites,?without?any?statistical?function,?it's?hard?to?use. > > So?we?plan?to?use?a?way?similar?to?jeprof?(from?jemalloc)?to?generate?a?report?file?that?aggregates?all?useful?information. > > During?the?implementation?process,?we?found?that?we?need?a?mechanism?to?notify?the?agent?to?generate?reports. > > The?common?practice?is: > a)?Register?a?service?port,?triggered?by?an?HTTP?request > b)?Triggered?by?signal > c)?Generate?reports?periodically,?or?when?the?process?exits > > But?these?three?ways?have?certain?problems. > For?a)?we?need?to?introduce?a?network?component,?will?increase?the?complexity?of?implementation > For?b)?we?cannot?pass?parameters > For?c)?some?files?that?may?never?be?used?will?be?generated > > Essentially,?this?question?is?how?to?notify?the?application?to?do?a?certain?task,?or?in?other?words,?how?do?we?issue?a?command?to?the?application.?We?believe?that?other?Java?developers?will?also?encounter?similar?problems. > > (And?sometimes?there?may?be?multiple?unrelated?dependent?components?in?a?Java?application?that?require?such?a?mechanism.) > > Naturally,?we?think?that?jcmd?can?already?issue?some?commands?registered?in?VM?to?the?application,?why?can't?we?extend?to?the?java?level? > > This?feature?will?be?very?useful?for?some?lightweight?tools,?just?like?the?scenario?we?encountered,?to?notify?the?tools?to?perform?certain?operations. > > In?addition,?this?feature?will?also?bring?benefits?to?Java?beginners. > > For?example,?in?the?beginning,?beginners?may?not?use?advanced?log?components,?but?they?will?also?encounter?the?need?to?output?debug?logs.?They?may?write?code?like?this: > > ``` > ????if?(debug)?{ > ??????System.out.println("..."); > ????} > ``` > > If?developers?can?easily?control?the?value?of?debug,?it's?attractive. > > Like?this: > > ``` > ????Factory.register("MyApp.flipDebug",?out?->?debug?=?!debug); > > ????jcmd??MyApp.flipDebug > ``` > > For?mainstream?framework,?we?can?apply?this?feature?to?trigger?some?common?activities,?such?as?health?checks,?graceful?shutdown,?and?dynamic?configuration?updates,?But?to?be?honest,?these?frameworks?are?very?mature?and?stable,?and?for?compatibility?purposes,?it's?hard?to?let?them?use?this?extension. > > Comments?welcome! > > Thanks, > Denghui > > From duke at openjdk.java.net Wed Nov 10 12:39:59 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 10 Nov 2021 12:39:59 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 Message-ID: PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One of its uses is to protect against ROP based attacks. This is done by signing the Link Register whenever it is stored on the stack, and authenticating the value when it is loaded back from the stack. If an attacker were to try to change control flow by editing the stack then the authentication check of the Link Register will fail, causing a segfault when the function returns. On a system with PAC enabled, it is expected that all applications will be compiled with ROP protection. Fedora 33 and upwards already provide this. By compiling for ARMv8.0, GCC and LLVM will only use the set of PAC instructions that exist in the NOP space - on hardware without PAC, these instructions act as NOPs, allowing backward compatibility for negligible performance cost (2 NOPs per non-leaf function). Hardware is currently limited to the Apple M1 MacBooks. All testing has been done within a Fedora Docker image. A run of SpecJVM showed no difference to that of noise - which was surprising. The most important part of this patch is simply compiling using branch protection provided by GCC/LLVM. This protects all C++ code from being used in ROP attacks, removing all static ROP gadgets from use. The remainder of the patch adds ROP protection to runtime generated code, in both stubs and compiled Java code. Attacks here are much harder as ROP gadgets must be found dynamically at runtime. If/when AOT compilation is added to JDK, then all stubs and compiled Java will be susceptible ROP gadgets being found by static analysis and therefore potentially as vulnerable as C++ code. There are a number of places where the VM changes control flow by rewriting the stack or otherwise. I?ve done some analysis as to how these could also be used for attacks (which I didn?t want to post here). These areas can be protected ensuring the pointers to various stubs and entry points are stored in memory as signed pointers. These changes are simple to make (they can be reduced to a type change in common code and a few addition sign/auth calls in the backend), but there a lot of them and the total code change is fairly large. I?m happy to provide a few work in progress patches. In order to match the security benefits of the Apple Arm64e ABI across the whole of JDK, then all the changes mentioned above would be required. ------------- Commit messages: - 8264130: PAC-RET protection for Linux/AArch64 - Add PAC assembly instructions - Add AArch64 ROP protection runtime flag - Build with branch protection Changes: https://git.openjdk.java.net/jdk/pull/6334/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8264130 Stats: 1273 lines in 25 files changed: 457 ins; 20 del; 796 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From sspitsyn at openjdk.java.net Wed Nov 10 12:44:41 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 10 Nov 2021 12:44:41 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace In-Reply-To: References: Message-ID: <7gYL85rBe8eKvM0anhb3qhZ5Y7xaUFsWwD9JeO1AioI=.b818affe-36fd-404a-8d6b-45ab93c8fab3@github.com> On Thu, 7 Oct 2021 12:42:48 GMT, Aleksey Shipilev wrote: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` src/hotspot/cpu/zero/frame_zero.cpp line 174: > 172: > 173: // validate locals > 174: address locals = (address) *interpreter_frame_locals_addr(); Unneeded spaces around '(address)'. ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From sspitsyn at openjdk.java.net Wed Nov 10 12:50:38 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 10 Nov 2021 12:50:38 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace In-Reply-To: References: Message-ID: <68Lgv_Hwls0iUcUZwRMANWQi7TEYT4K1XFPRZyB071o=.33d613b4-207f-41b8-b976-6edb4ba9eb48@github.com> On Thu, 7 Oct 2021 12:42:48 GMT, Aleksey Shipilev wrote: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` Hi Aleksey, Thank you for the update. It looks pretty good to me. I've inlined a couple of minor comments. Also, I hope, you will update the copyright years. Thanks, Serguei src/hotspot/share/prims/forte.cpp line 348: > 346: return false; > 347: } > 348: #endif Could you, please, add some simple comments explaining each case at lines: 325, 329 and 336? ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5848 From rkennke at openjdk.java.net Wed Nov 10 12:52:08 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Nov 2021 12:52:08 GMT Subject: RFR: 8275527: Refactor forward pointer access [v5] In-Reply-To: References: Message-ID: > Accessing the forward pointer is currently a little inconsistent. Some code paths call oopDesc::forwardee() / oopDesc::is_forwarded(), some code paths call forwardee() and check it for ==/!= NULL, some code paths even call markWord::decode_pointer() and markWord::is_marked() instead. > > This change attempts to make the situation more consistent. For simple cases it preserves oopDesc::forwardee() / is_forwarded(), some cases need to use the markWord for consistency in concurrent GC, they now use markWord::forwardee() and markWord::is_forwarded(). Also, checking whether or not an object is forwarded is now consistently done using is_forwarded() and not by checking forwardee ==/!= NULL. This also resolves the mess in G1 full GC that changes not-forwarded objects to have a NULL (fake-) pointer. This is not necessary, because we can just as well use the lock bits to determine whether or not the object is forwarded. > > Testing: > - [x] tier > - [x] tier2 > - [x] hotspot_gc Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into optimize-fwdptr - Don't use forwarded terminology in markWord - Move forward impl into markWord and add assert - Fix Parallel GC mistake - Revert unnecessary changes - Update some copyright headers - Add missing includes - Merge branch 'master' into optimize-fwdptr - Add missing includes - Rename mwd -> fwd - ... and 4 more: https://git.openjdk.java.net/jdk/compare/a0b84453...d63962a3 ------------- Changes: https://git.openjdk.java.net/jdk/pull/5955/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5955&range=04 Stats: 46 lines in 9 files changed: 4 ins; 26 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/5955.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5955/head:pull/5955 PR: https://git.openjdk.java.net/jdk/pull/5955 From rkennke at openjdk.java.net Wed Nov 10 12:52:13 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Nov 2021 12:52:13 GMT Subject: RFR: 8275527: Refactor forward pointer access [v4] In-Reply-To: References: Message-ID: On Mon, 1 Nov 2021 09:25:52 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Move forward impl into markWord and add assert > > src/hotspot/share/oops/markWord.hpp line 253: > >> 251: return cast_to_oop(decode_pointer()); >> 252: } >> 253: }; > > This brings the forwarded/forwardee terminology into the markWord. The markWord was previously decoupled from those to concepts. I would personally let those function names stay in oopDesc and not leak down into the markWord. If you do want to keep it here, could you update the comments at the top that describes the bits? > > // [ptr | 11] marked used to mark an object Yeah, I am not quite sure about this. We have a couple of places where we need to use the markWord direcly, and they read m.is_marked() (when it really means is_forwarded, even though it's the same in the implementation), and then goes on to cast_to_oop(m.decode_pointer()) which reads more ugly than simply m.forwardee() which also comes with an assert and the cast. I reverted the markWord change and related call-sites now. Maybe this warrants more thinking/discussion. ------------- PR: https://git.openjdk.java.net/jdk/pull/5955 From aph at openjdk.java.net Wed Nov 10 13:14:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Nov 2021 13:14:43 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Gosh. This is going to take some time to review, and will need at least two reviewers. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Wed Nov 10 13:25:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Nov 2021 13:25:40 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5185: > 5183: // ROP Protection > 5184: > 5185: void MacroAssembler::protect_return_address() { We need proper, full, detailed comments about what these functions do, with reference to primary AArch64 documentation. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From erikj at openjdk.java.net Wed Nov 10 13:34:38 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Wed, 10 Nov 2021 13:34:38 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: <9inSsWwjEQnZT_x-9GirjL-Avmycfnyj6yZoqCJ8M4g=.770fc4bb-6731-4eb2-83a7-cf018438ed7e@github.com> On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Build change looks good, but I can't comment on the code changes. ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Wed Nov 10 13:34:39 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 10 Nov 2021 13:34:39 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: <4yyBp8jgXpmK0ZyywX4mrjo5vfTWfy5CHV97fnX4-EE=.2bd02511-67cd-40a4-8fb0-a79b095f4bcd@github.com> On Wed, 10 Nov 2021 13:11:21 GMT, Andrew Haley wrote: > Gosh. This is going to take some time to review, and will need at least two reviewers. Sure. And thanks in advance. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Wed Nov 10 13:37:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Nov 2021 13:37:41 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. src/hotspot/os_cpu/bsd_aarch64/pauth_bsd_aarch64.inline.hpp line 25: > 23: */ > 24: > 25: #ifndef OS_CPU_BSD_AARCH64_PAUTH_BSD_AARCH64_INLINE_HPP Are these two files different enough to separate them for BSD and Linux? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From ihse at openjdk.java.net Wed Nov 10 14:37:40 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Wed, 10 Nov 2021 14:37:40 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Changes requested by ihse (Reviewer). make/autoconf/flags-cflags.m4 line 899: > 897: elif test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then > 898: # Check that the compiler actually supports branch protection. > 899: FLAGS_COMPILER_CHECK_ARGUMENTS(ARGUMENT: [${BRANCH_PROTECTION_FLAG}], This branch misses a AC_MSG_RESULT, which prints the newline. The resulting output will look messy. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Wed Nov 10 15:04:39 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 10 Nov 2021 15:04:39 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 13:34:38 GMT, Andrew Haley wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > src/hotspot/os_cpu/bsd_aarch64/pauth_bsd_aarch64.inline.hpp line 25: > >> 23: */ >> 24: >> 25: #ifndef OS_CPU_BSD_AARCH64_PAUTH_BSD_AARCH64_INLINE_HPP > > Are these two files different enough to separate them for BSD and Linux? My motivation was to avoid having any ifdefs - but we need one anyway for the apple ifdef. If I merged the two we would end up with just the contents of the BSD version of the file. There is also the windows version of the file, which for now has empty functions. If PAC in windows is added, that'll either use the same code or maybe Windows will provide an API (like the Apple one). Merging everything would mean windows gains the UseROPProtection check. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Wed Nov 10 15:27:41 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 10 Nov 2021 15:27:41 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. I am also reviewing this. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Wed Nov 10 16:03:34 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 10 Nov 2021 16:03:34 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 14:34:18 GMT, Magnus Ihse Bursie wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > make/autoconf/flags-cflags.m4 line 899: > >> 897: elif test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang; then >> 898: # Check that the compiler actually supports branch protection. >> 899: FLAGS_COMPILER_CHECK_ARGUMENTS(ARGUMENT: [${BRANCH_PROTECTION_FLAG}], > > This branch misses a AC_MSG_RESULT, which prints the newline. The resulting output will look messy. Looking at this block of code again, I've got far too many outputted lines compared to other features. Removing some means I can simplify the code too, so I'll do that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From shade at openjdk.java.net Wed Nov 10 16:26:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Nov 2021 16:26:10 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v2] In-Reply-To: References: Message-ID: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Review feedback - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace - Initial work: runs async-profiler successfully ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5848/files - new: https://git.openjdk.java.net/jdk/pull/5848/files/5575516c..8e25258d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=00-01 Stats: 888778 lines in 1818 files changed: 455790 ins; 426281 del; 6707 mod Patch: https://git.openjdk.java.net/jdk/pull/5848.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848 PR: https://git.openjdk.java.net/jdk/pull/5848 From shade at openjdk.java.net Wed Nov 10 16:26:16 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Nov 2021 16:26:16 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v2] In-Reply-To: <7gYL85rBe8eKvM0anhb3qhZ5Y7xaUFsWwD9JeO1AioI=.b818affe-36fd-404a-8d6b-45ab93c8fab3@github.com> References: <7gYL85rBe8eKvM0anhb3qhZ5Y7xaUFsWwD9JeO1AioI=.b818affe-36fd-404a-8d6b-45ab93c8fab3@github.com> Message-ID: <5Z7ibS9XSydk2okYR911xl6Q0GSz7gEzZbp0MW7_Edo=.0a86e320-8dd3-4216-ae79-818df7cd6b38@github.com> On Wed, 10 Nov 2021 12:41:44 GMT, Serguei Spitsyn wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Review feedback >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Initial work: runs async-profiler successfully > > src/hotspot/cpu/zero/frame_zero.cpp line 174: > >> 172: >> 173: // validate locals >> 174: address locals = (address) *interpreter_frame_locals_addr(); > > Unneeded spaces around '(address)'. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From shade at openjdk.java.net Wed Nov 10 16:38:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Nov 2021 16:38:57 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3] In-Reply-To: References: Message-ID: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More reviews ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5848/files - new: https://git.openjdk.java.net/jdk/pull/5848/files/8e25258d..68ef4b63 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=01-02 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5848.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848 PR: https://git.openjdk.java.net/jdk/pull/5848 From shade at openjdk.java.net Wed Nov 10 16:38:58 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Nov 2021 16:38:58 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3] In-Reply-To: <68Lgv_Hwls0iUcUZwRMANWQi7TEYT4K1XFPRZyB071o=.33d613b4-207f-41b8-b976-6edb4ba9eb48@github.com> References: <68Lgv_Hwls0iUcUZwRMANWQi7TEYT4K1XFPRZyB071o=.33d613b4-207f-41b8-b976-6edb4ba9eb48@github.com> Message-ID: On Wed, 10 Nov 2021 12:44:16 GMT, Serguei Spitsyn wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> More reviews > > src/hotspot/share/prims/forte.cpp line 348: > >> 346: return false; >> 347: } >> 348: #endif > > Could you, please, add some simple comments explaining each case at lines: 325, 329 and 336? See new commits! ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From rkennke at openjdk.java.net Wed Nov 10 16:55:54 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Nov 2021 16:55:54 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently Message-ID: The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. Testing: - [x] tier1 - [x] tier2 - [ ] tier3 - [ ] tier4 ------------- Commit messages: - Change VerifyHeavyMonitors flag to diagnostic - 8276901: Implement UseHeavyMonitors consistently Changes: https://git.openjdk.java.net/jdk/pull/6320/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276901 Stats: 190 lines in 12 files changed: 54 ins; 18 del; 118 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From mdoerr at openjdk.java.net Wed Nov 10 17:12:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 10 Nov 2021 17:12:44 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. Thanks for adding a test. Your new additions look basically good, but I have a few remarks and questions. src/hotspot/share/prims/whitebox.cpp line 987: > 985: bool overflow = false; > 986: for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { > 987: if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better. src/hotspot/share/prims/whitebox.cpp line 1016: > 1014: } > 1015: ResourceMark rm(THREAD); > 1016: char *reason_str = (reason_obj == NULL) ? I think we should use `const char*` as far as possible. src/hotspot/share/runtime/deoptimization.cpp line 2695: > 2693: return 0; > 2694: } > 2695: Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful. test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78: > 76: private static final WhiteBox WB = WhiteBox.getWhiteBox(); > 77: // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter. > 78: private static final boolean JDK8275908_fixed = false; I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From coleenp at openjdk.java.net Wed Nov 10 17:24:47 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 10 Nov 2021 17:24:47 GMT Subject: RFR: 8276658: Clean up JNI local handles code Message-ID: JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. Move the fields to JavaThread and adding JavaThread* argument. Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. The commits in this change have been performance tested individually and together with no meaningful differences from mainline. ------------- Commit messages: - The VM Thread creates handles on the caller thread, unless it runs out then it allocates a block on its own thread, which it never cleans up. Pass the caller thread to allocate_handle so that allocate_block will add to the right thread, which is a JavaThread. - Refactor pushing and popping JNIHandleBlocks. - Remove JNIHandleBlock global freelists and Mutex - Move active_handles to JavaThread. Changes: https://git.openjdk.java.net/jdk/pull/6336/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276658 Stats: 426 lines in 25 files changed: 77 ins; 302 del; 47 mod Patch: https://git.openjdk.java.net/jdk/pull/6336.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6336/head:pull/6336 PR: https://git.openjdk.java.net/jdk/pull/6336 From sspitsyn at openjdk.java.net Wed Nov 10 18:05:35 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 10 Nov 2021 18:05:35 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 16:38:57 GMT, Aleksey Shipilev wrote: >> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. >> >> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. >> >> Additional testing: >> - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass >> - [x] Linux x86_64 Zero works with `async-profiler` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More reviews Marked as reviewed by sspitsyn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From duke at openjdk.java.net Wed Nov 10 18:13:38 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 10 Nov 2021 18:13:38 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13] In-Reply-To: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com> References: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com> Message-ID: On Mon, 1 Nov 2021 13:11:40 GMT, Andrew Haley wrote: >> This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412): >> >> SNode awaitFulfill(SNode s, boolean timed, long nanos) { >> /* >> * When a node/thread is about to block, it sets its waiter >> * field and then rechecks state at least one more time >> * before actually parking, thus covering race vs >> * fulfiller noticing that waiter is non-null so should be >> * woken. >> * >> * When invoked by nodes that appear at the point of call >> * to be at the head of the stack, calls to park are >> * preceded by spins to avoid blocking when producers and >> * consumers are arriving very close in time. This can >> * happen enough to bother only on multiprocessors. >> * >> * The order of checks for returning out of main loop >> * reflects fact that interrupts have precedence over >> * normal returns, which have precedence over >> * timeouts. (So, on timeout, one last check for match is >> * done before giving up.) Except that calls from untimed >> * SynchronousQueue.{poll/offer} don't check interrupts >> * and don't wait at all, so are trapped in transfer >> * method rather than calling awaitFulfill. >> */ >> final long deadline = timed ? System.nanoTime() + nanos : 0L; >> Thread w = Thread.currentThread(); >> int spins = shouldSpin(s) >> ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS) >> : 0; >> for (;;) { >> if (w.isInterrupted()) >> s.tryCancel(); >> SNode m = s.match; >> if (m != null) >> return m; >> if (timed) { >> nanos = deadline - System.nanoTime(); >> if (nanos <= 0L) { >> s.tryCancel(); >> continue; >> } >> } >> if (spins > 0) { >> Thread.onSpinWait(); >> spins = shouldSpin(s) ? (spins - 1) : 0; >> } >> else if (s.waiter == null) >> s.waiter = w; // establish waiter so can park next iter >> else if (!timed) >> LockSupport.park(this); >> else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD) >> LockSupport.parkNanos(this, nanos); >> } >> } >> >> >> I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark. > > I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it? Hi Andrew (@theRealAph), I've created a PR: https://github.com/openjdk/jdk/pull/6338 with a microbenchmark. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From rkennke at openjdk.java.net Wed Nov 10 19:19:13 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Nov 2021 19:19:13 GMT Subject: RFR: 8276901: Implement UseHeavyMonitors consistently [v2] In-Reply-To: References: Message-ID: > The flag UseHeavyMonitors seems to imply that it makes Hotspot always use inflated monitors, rather than stack locks. However, it is only implemented in the interpreter that way. When it calls into runtime, it would still happily stack-lock. Even worse, C1 uses another flag UseFastLocking to achieve something similar (with the same caveat that runtime would stack-lock anyway). C2 doesn't have any such mechanism at all. > I would like to experiment with disabling stack-locking, and thus, having this flag work as expected would seem very useful. > > The change removes the C1 flag UseFastLocking, and replaces its uses with equivalent (i.e. inverted) UseHeavyMonitors instead. I think it makes sense to make UseHeavyMonitors develop (I wouldn't want anybody to use this in production, not currently without this change, and not with this change). I also added a flag VerifyHeavyMonitors to be able to verify that stack-locking is really disabled. We can't currently verify this uncondiftionally (e.g. in debug builds) because all non-x86_64 platforms would need work. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [ ] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Verify monitors even in non-debug builds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6320/files - new: https://git.openjdk.java.net/jdk/pull/6320/files/f7b4c179..49dbc146 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6320&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6320.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6320/head:pull/6320 PR: https://git.openjdk.java.net/jdk/pull/6320 From coleenp at openjdk.java.net Wed Nov 10 19:20:52 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 10 Nov 2021 19:20:52 GMT Subject: RFR: 8276889: Improve compatibility discussion in instanceKlass.cpp Message-ID: I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to. At one point in time, I really wanted to remove this code (still do but not as much now). With JVMTI Heap functions deprecated JDK-8268242, maybe soon. Please review this trivial change. ------------- Commit messages: - 8276889: Improve compatibility discussion in instanceKlass.cpp Changes: https://git.openjdk.java.net/jdk/pull/6340/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6340&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276889 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6340.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6340/head:pull/6340 PR: https://git.openjdk.java.net/jdk/pull/6340 From hseigel at openjdk.java.net Wed Nov 10 19:31:35 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 10 Nov 2021 19:31:35 GMT Subject: RFR: 8276889: Improve compatibility discussion in instanceKlass.cpp In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 19:13:22 GMT, Coleen Phillimore wrote: > I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to. At one point in time, I really wanted to remove this code (still do but not as much now). With JVMTI Heap functions deprecated JDK-8268242, maybe soon. > Please review this trivial change. Looks good and trivial. Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6340 From coleenp at openjdk.java.net Wed Nov 10 19:48:38 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 10 Nov 2021 19:48:38 GMT Subject: RFR: 8276889: Improve compatibility discussion in instanceKlass.cpp In-Reply-To: References: Message-ID: <8XvzTfDW6ig-MGL8sVTvZ2dUzfeMeFELDW4tPoBPX8E=.59b33576-c242-4d61-94b4-2a3151a37c4a@github.com> On Wed, 10 Nov 2021 19:13:22 GMT, Coleen Phillimore wrote: > I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to. At one point in time, I really wanted to remove this code (still do but not as much now). With JVMTI Heap functions deprecated JDK-8268242, maybe soon. > Please review this trivial change. Thanks Harold! ------------- PR: https://git.openjdk.java.net/jdk/pull/6340 From coleenp at openjdk.java.net Wed Nov 10 19:48:38 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 10 Nov 2021 19:48:38 GMT Subject: Integrated: 8276889: Improve compatibility discussion in instanceKlass.cpp In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 19:13:22 GMT, Coleen Phillimore wrote: > I removed the confusing comment that was missing some words, and linked the RFE to this one that the comment was referring to. At one point in time, I really wanted to remove this code (still do but not as much now). With JVMTI Heap functions deprecated JDK-8268242, maybe soon. > Please review this trivial change. This pull request has now been integrated. Changeset: 67c2714b Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/67c2714ba2c9658e07153a6f50391c896e4caebc Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8276889: Improve compatibility discussion in instanceKlass.cpp Reviewed-by: hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/6340 From iklam at openjdk.java.net Wed Nov 10 20:26:38 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 10 Nov 2021 20:26:38 GMT Subject: RFR: 8269986: Remove +3 from Symbol::identity_hash() In-Reply-To: <3QfGGc4vIbwBz-k8URuVmp2bVWOID4UQmEwKBSQo7Ls=.61a1aca5-7b04-4017-a37a-3f82a6327e9c@github.com> References: <3QfGGc4vIbwBz-k8URuVmp2bVWOID4UQmEwKBSQo7Ls=.61a1aca5-7b04-4017-a37a-3f82a6327e9c@github.com> Message-ID: On Tue, 9 Nov 2021 20:01:58 GMT, Coleen Phillimore wrote: >> Please review this trivial change that removes the `+3` from here: >> >> >> unsigned Symbol::identity_hash() const { >> unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3)); >> ^^^ >> return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) | >> ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16); >> } >> >> >> The `+3` was intended to avoid getting the same value for these bits: >> >> >> ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07) >> >> >> However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive. >> >> Testing: Oracle CI tiers 1-4 > > Looks good! Thanks for doing the performance analysis. Thanks @coleenp for the review. @cl4es also reviewed it off-line. Since the change is trivial, I am pushing it now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6287 From iklam at openjdk.java.net Wed Nov 10 20:26:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 10 Nov 2021 20:26:39 GMT Subject: Integrated: 8269986: Remove +3 from Symbol::identity_hash() In-Reply-To: References: Message-ID: On Sun, 7 Nov 2021 21:10:35 GMT, Ioi Lam wrote: > Please review this trivial change that removes the `+3` from here: > > > unsigned Symbol::identity_hash() const { > unsigned addr_bits = (unsigned)((uintptr_t)this >> (LogMinObjAlignmentInBytes + 3)); > ^^^ > return ((unsigned)extract_hash(_hash_and_refcount) & 0xffff) | > ((addr_bits ^ (length() << 8) ^ (( _body[0] << 8) | _body[1])) << 16); > } > > > The `+3` was intended to avoid getting the same value for these bits: > > > ((uintptr_t)this) >> LogMinObjAlignmentInBytes) & 0x07) > > > However, as shown in the [bug report](https://bugs.openjdk.java.net/browse/JDK-8269986), the values for these bits are evenly distributed. So the `+3` is not necessary and may actually be counter-productive. > > Testing: Oracle CI tiers 1-4 This pull request has now been integrated. Changeset: df02daa6 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/df02daa6f9df801a7e0b6203fd6411d8a62bb277 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8269986: Remove +3 from Symbol::identity_hash() Reviewed-by: coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/6287 From coleenp at openjdk.java.net Wed Nov 10 22:12:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 10 Nov 2021 22:12:44 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag Message-ID: This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. ------------- Commit messages: - 8258192: Obsolete the CriticalJNINatives flag Changes: https://git.openjdk.java.net/jdk/pull/6343/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258192 Stats: 1790 lines in 24 files changed: 0 ins; 1616 del; 174 mod Patch: https://git.openjdk.java.net/jdk/pull/6343.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6343/head:pull/6343 PR: https://git.openjdk.java.net/jdk/pull/6343 From dlong at openjdk.java.net Thu Nov 11 03:36:51 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 11 Nov 2021 03:36:51 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information Message-ID: The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: 1. added a version number to the replay file 2. removed unnused ci fields 3. corrected comment in TestLambdas.java ------------- Commit messages: - replay failure due to incomplete ciMethodData information Changes: https://git.openjdk.java.net/jdk/pull/6344/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276095 Stats: 59 lines in 7 files changed: 27 ins; 24 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/6344.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344 PR: https://git.openjdk.java.net/jdk/pull/6344 From stuefe at openjdk.java.net Thu Nov 11 06:30:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 11 Nov 2021 06:30:15 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. For the whole story please refer to https://bugs.openjdk.java.net/browse/JDK-8275301. > > This proposal adds NMT buffer overflow checking. As laid out in JDK-8275301: > > - it would give us C-heap overflow checking in release builds > - the additional costs are neglectable > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. The error reports would also be confusing. > - it is a preparation for future code removal (the memory guarding done in debug only in os::malloc() and friends, and possibly the guarding done with CheckJNICalls) > > Patch notes: > > 1) The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. > > On 64-bit, we don't even need to enlarge the malloc header: we carve some bits out by decreasing the size of the bucket index bit field to 16 bits. The bucket index field is used to store the bucket slot of the malloc site table in NMT detail mode. The malloc site table width is 512 atm, so 65k gives plenty of room for growing the malloc site table should we ever want to. > > On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes. That is because there were not enough bits to spare for a canary. On the upside, 8 bytes were not enough anyway, strictly speaking, to guarantee proper alignment e.g. for 128bit data types on all 32-bit platforms. See e.g. the malloc alignment the glibc uses. > > I also took the freedom of re-arranging the malloc header fields a bit to minimize the difference between 32-bit and 64-bit platforms, and to align each field optimally according to its size. I also switched from bitfields to real types in order to be able to do a sizeof() on them. > > For more details, see the comment in mallocTracker.hpp. > > 2) I added a footer canary trailing the user allocation to catch tail buffer overruns. For simplicity reasons (alignment) and to save some cycles I made it a byte only. That is enough to catch most overrun scenarios. If you think this is too small, I'm open to change it. > > 3) I put a bit of work into error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > 4) I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > (Note that these gtests, to test anything, need to run with NMT switched on. We do this as part of our NMT jtreg-controlled gtests in tier1). > > Even though the patch adds more code than it removes, it prepares possible code removal (if we can agree to do that) and the net result will be less complexity, not more. Again, see JDK-8275301 for details. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 14 days in a row without problems Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge - Let NMT do overflow detection ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5952/files - new: https://git.openjdk.java.net/jdk/pull/5952/files/f4a92cf5..e04a105d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=00-01 Stats: 886552 lines in 1706 files changed: 455452 ins; 424812 del; 6288 mod Patch: https://git.openjdk.java.net/jdk/pull/5952.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952 PR: https://git.openjdk.java.net/jdk/pull/5952 From dholmes at openjdk.java.net Thu Nov 11 07:10:35 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 11 Nov 2021 07:10:35 GMT Subject: RFR: 8276658: Clean up JNI local handles code In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 17:16:29 GMT, Coleen Phillimore wrote: > JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. > Move the fields to JavaThread and adding JavaThread* argument. > Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. > Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. > The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. > The commits in this change have been performance tested individually and together with no meaningful differences from mainline. Hi Coleen, Nice cleanup and refactoring! I'm not familiar with all the details but the reshuffling looks good to me. One query and one minor issue below. Thanks, David src/hotspot/share/compiler/compileBroker.cpp line 2324: > 2322: // Remove the JNI handle block after the ciEnv destructor has run in > 2323: // the previous block. > 2324: pop_jni_handle_block(); Does the fact the JNIHandleMark destructor won't get executed until much later, at the end of this method, make any difference? src/hotspot/share/runtime/vmThread.hpp line 63: > 61: class VMThread: public NamedThread { > 62: private: > 63: volatile bool _is_running; I don't see this being initialized to false. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6336 From shade at openjdk.java.net Thu Nov 11 07:27:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Nov 2021 07:27:39 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore wrote: > This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. > Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1746: > 1744: // NW [ABI_REG_ARGS] <-- 1) R1_SP > 1745: // [outgoing arguments] <-- 2) R1_SP + out_arg_slot_offset > 1746: // [oopHandle area] <-- 3) R1_SP + oop_handle_offset (save area for critical natives) ? `?`. The comment `(save area for critical natives)` must be redundant now. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1551: > 1549: int total_c_args = total_in_args+1; > 1550: if (method->is_static()) { > 1551: total_c_args++; In this patch, sometimes we keep the if structure, like here, but in other places, we replace this with: int total_c_args = total_in_args + (method->is_static() ? 2 : 1) Should probably stick with a single style. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1793: > 1791: int c_arg = arg_order.at(ai + 1); > 1792: __ block_comment(err_msg("move %d -> %d", i, c_arg)); > 1793: assert (c_arg != -1, "wrong direction"); `assert (c_arg != -1 && i != -1, "wrong direction");`? src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1905: > 1903: } else { > 1904: // Compute a valid move order, using tmp_vmreg to break any cycles > 1905: ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg); `ComputeMoveOrder` is still used somewhere, or? src/hotspot/share/runtime/sharedRuntime.cpp line 3019: > 3017: if (CriticalJNINatives && !method->is_method_handle_intrinsic()) { > 3018: // We perform the I/O with transition to native before acquiring AdapterHandlerLibrary_lock. > 3019: critical_entry = NativeLookup::lookup_critical_entry(method); `critical_entry` variable is now redundant? ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From shade at openjdk.java.net Thu Nov 11 07:30:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Nov 2021 07:30:37 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3] In-Reply-To: References: Message-ID: <81d18oGgsBvytKRcjrO6lkygVY2G6wY-UHntuB47Fso=.084c545c-b9f1-4a07-a264-677cdbb9a2d2@github.com> On Wed, 10 Nov 2021 18:03:00 GMT, Serguei Spitsyn wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> More reviews > > Marked as reviewed by sspitsyn (Reviewer). Thank you, @sspitsyn! Any more reviews, anyone? ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From chagedorn at openjdk.java.net Thu Nov 11 07:36:33 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Nov 2021 07:36:33 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long wrote: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java Looks good to me! Thanks for also adapting the changes from 8275868 to use the new version number. src/hotspot/share/ci/ciReplay.cpp line 900: > 898: > 899: // Only initialize the protection domain handle with the protection domain of the very first entry. > 900: // This also ensures that older replay files work. Second sentence can now be removed with version numbers. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6344 From duke at openjdk.java.net Thu Nov 11 08:48:07 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 11 Nov 2021 08:48:07 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Simplify branch protection configure check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/e0e3f666..29471d30 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=00-01 Stats: 12 lines in 1 file changed: 0 ins; 6 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Thu Nov 11 08:49:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 08:49:36 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13] In-Reply-To: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com> References: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com> Message-ID: <6AGSxH9l4ABhczTNYkGSUQncGgENSWhYtOdnLAbsicY=.435c44c2-ec21-42f9-ab53-7f0891d0b42b@github.com> On Mon, 1 Nov 2021 13:11:40 GMT, Andrew Haley wrote: >> This test is too artificial. Going through my records I've found I have a microbenchmark for `java.util.concurrent. SynchronousQueue` which shows good improvements on jdk11. `SynchronousQueue` uses `onSpinWait`. Since jdk17 `SynchronousQueue` has not been using `onSpinWait` any more (See https://bugs.openjdk.java.net/browse/JDK-8267502). Maybe I can come up with a microbenchmark based on `SynchronousQueue` [code](https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java#L412): >> >> SNode awaitFulfill(SNode s, boolean timed, long nanos) { >> /* >> * When a node/thread is about to block, it sets its waiter >> * field and then rechecks state at least one more time >> * before actually parking, thus covering race vs >> * fulfiller noticing that waiter is non-null so should be >> * woken. >> * >> * When invoked by nodes that appear at the point of call >> * to be at the head of the stack, calls to park are >> * preceded by spins to avoid blocking when producers and >> * consumers are arriving very close in time. This can >> * happen enough to bother only on multiprocessors. >> * >> * The order of checks for returning out of main loop >> * reflects fact that interrupts have precedence over >> * normal returns, which have precedence over >> * timeouts. (So, on timeout, one last check for match is >> * done before giving up.) Except that calls from untimed >> * SynchronousQueue.{poll/offer} don't check interrupts >> * and don't wait at all, so are trapped in transfer >> * method rather than calling awaitFulfill. >> */ >> final long deadline = timed ? System.nanoTime() + nanos : 0L; >> Thread w = Thread.currentThread(); >> int spins = shouldSpin(s) >> ? (timed ? MAX_TIMED_SPINS : MAX_UNTIMED_SPINS) >> : 0; >> for (;;) { >> if (w.isInterrupted()) >> s.tryCancel(); >> SNode m = s.match; >> if (m != null) >> return m; >> if (timed) { >> nanos = deadline - System.nanoTime(); >> if (nanos <= 0L) { >> s.tryCancel(); >> continue; >> } >> } >> if (spins > 0) { >> Thread.onSpinWait(); >> spins = shouldSpin(s) ? (spins - 1) : 0; >> } >> else if (s.waiter == null) >> s.waiter = w; // establish waiter so can park next iter >> else if (!timed) >> LockSupport.park(this); >> else if (nanos > SPIN_FOR_TIMEOUT_THRESHOLD) >> LockSupport.parkNanos(this, nanos); >> } >> } >> >> >> I've created https://bugs.openjdk.java.net/browse/JDK-8275728 to write such a microbenchmark. > > I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it? > Hi Andrew (@theRealAph), I've created a PR: #6338 with a microbenchmark. That's really weird. Why is the benchmark not here? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From duke at openjdk.java.net Thu Nov 11 09:36:36 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 11 Nov 2021 09:36:36 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15] In-Reply-To: References: Message-ID: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com> > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `nop`: use `nop` instruction for spin pauses. > - `isb`: use `isb` instruction for spin pauses. > - `yield`: use `yield` instruction for spin pauses. > > And `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > CSR: https://bugs.openjdk.java.net/browse/JDK-8274564 Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: 8275728: Add simple Producer/Consumer microbenchmark for Thread.onSpinWait ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/a06b4821..0d6fc3f0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=13-14 Stats: 204 lines in 1 file changed: 204 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From duke at openjdk.java.net Thu Nov 11 09:42:38 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 11 Nov 2021 09:42:38 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15] In-Reply-To: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com> References: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com> Message-ID: On Thu, 11 Nov 2021 09:36:36 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `nop`: use `nop` instruction for spin pauses. >> - `isb`: use `isb` instruction for spin pauses. >> - `yield`: use `yield` instruction for spin pauses. >> >> And `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed >> >> CSR: https://bugs.openjdk.java.net/browse/JDK-8274564 > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > 8275728: Add simple Producer/Consumer microbenchmark for Thread.onSpinWait `ThreadOnSpinWaitProducerConsumer` is to demonstrate `Thread.onSpinWait` can be used to avoid heavy locks. The microbenchmark differs from [Gil's original benchmark](https://github.com/giltene/GilExamples/tree/master/SpinWaitTest) and [Dmitry's variations](http://cr.openjdk.java.net/~dchuyko/8186670/yield/spinwait.html). Those benchmarks produce/consume data by incrementing a volatile counter. The latency of such operations is almost zero. They also don't use heavy locks. According to [Gil's SpinWaitTest.java](https://github.com/giltene/GilExamples/blob/master/SpinWaitTest/src/main/java/SpinWaitTest.java): > This test can be used to measure and document the impact of Runtime.onSpinWait() behavior > on thread-to-thread communication latencies. E.g. when the two threads are pinned to > the two hardware threads of a shared x86 core (with a shared L1), this test will > demonstrate an estimate the best case thread-to-thread latencies possible on the > platform Gil's microbenchmark targets SMT cases (x86 hyperthreading). As not all CPUs support SMT, the microbenchmarks cannot demonstrate benefits of `Thread.onSpinWait`. It is actually opposite. They show `Thread.onSpinWait` has negative impact on performance. The microbenchmark from PR uses `BigInteger` to have 100 - 200 ns latencies for producing/consuming data. These latencies can cause either a producer or a consumer to wait each another. Waiting is implemented with `Object.wait`/`Object.notify` which are heavy. `Thread.onSpinWait` can be used in a spin loop to avoid them. **ARM64 results**: - No spin loop Benchmark (maxNum) (spinNum) Mode Cnt Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 0 avgt 75 1520.448 ? 40.507 us/op - No `Thread.onSpinWait` intrinsic Benchmark (maxNum) (spinNum) Mode Cnt Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 125 avgt 75 1580.756 ? 47.501 us/op - `ISB`-based `Thread.onSpinWait` intrinsic Benchmark (maxNum) (spinNum) Mode Cnt Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 125 avgt 75 617.454 ? 174.431 us/op **X86_64 results**: - No spin loop Benchmark (maxNum) (spinNum) Mode Cnt Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 125 avgt 75 1417.944 ? 1.691 us/op - No `Thread.onSpinWait` intrinsic Benchmark (maxNum) (spinNum) Mode Cnt Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 125 avgt 75 1410.987 ? 2.093 us/op - `PAUSE`-based `Thread.onSpinWait` intrinsic Benchmark (maxNum) (spinNum) Mode Cnt Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 125 avgt 75 217.054 ? 1.283 us/op ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From duke at openjdk.java.net Thu Nov 11 09:42:39 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 11 Nov 2021 09:42:39 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v13] In-Reply-To: <6AGSxH9l4ABhczTNYkGSUQncGgENSWhYtOdnLAbsicY=.435c44c2-ec21-42f9-ab53-7f0891d0b42b@github.com> References: <31DzKXEmMNYWZ1NL3FroXD7dCIDhwBJNzRotZCkKTqg=.30bc4ee0-9701-4cf7-925d-27901f47cdcc@github.com> <6AGSxH9l4ABhczTNYkGSUQncGgENSWhYtOdnLAbsicY=.435c44c2-ec21-42f9-ab53-7f0891d0b42b@github.com> Message-ID: <1GRLIikoCIaOxbXAx7d5DhHz7ne8pjiKTB9t7mRNPIk=.1832e508-3f36-4893-94bc-02c0304bda23@github.com> On Thu, 11 Nov 2021 08:46:23 GMT, Andrew Haley wrote: >> I suggest you do https://bugs.openjdk.java.net/browse/JDK-8275728 before you commit this. A benchmark which proves that this patch has some utility is needed, isn't it? > >> Hi Andrew (@theRealAph), I've created a PR: #6338 with a microbenchmark. > > That's really weird. Why is the benchmark not here? I thought a separate PR would simplify a discussion. Sorry if I was wrong. I added it here. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From simonis at openjdk.java.net Thu Nov 11 09:48:35 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 09:48:35 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Wed, 10 Nov 2021 16:56:07 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > > src/hotspot/share/prims/whitebox.cpp line 987: > >> 985: bool overflow = false; >> 986: for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { >> 987: if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { > > Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better. I've tried it but the resulting version is slightly longer and in my opinion not really more readable: WB_ENTRY(jint, WB_GetMethodTrapCount(JNIEnv* env, jobject o, jobject method, jstring reason_obj)) jmethodID jmid = reflected_method_to_jmid(thread, env, method); CHECK_JNI_EXCEPTION_(env, 0); methodHandle mh(THREAD, Method::checked_resolve_jmethod_id(jmid)); uint cnt = 0; MethodData* mdo = mh->method_data(); if (mdo != NULL) { ResourceMark rm(THREAD); if (reason_obj != NULL) { char* reason_str = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(reason_obj)); for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { if (!strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { cnt = mdo->trap_count(reason); // Count in the overflow trap count on overflow if (cnt == (uint)-1) { cnt = mdo->trap_count_limit() + mdo->overflow_trap_count(); } break; } } } else { bool overflow = false; for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { uint c = mdo->trap_count(reason); if (c == (uint)-1) { c = mdo->trap_count_limit(); if (!overflow) { // Count overflow trap count just once overflow = true; c += mdo->overflow_trap_count(); } } cnt += c; } } } return cnt; WB_END But for me it's actually no difference. Please just let me know if you'd still prefer the alternative version. PS: I've updated the documentation of the method which was inaccurate for `reason==NULL`. > src/hotspot/share/prims/whitebox.cpp line 1016: > >> 1014: } >> 1015: ResourceMark rm(THREAD); >> 1016: char *reason_str = (reason_obj == NULL) ? > > I think we should use `const char*` as far as possible. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 09:54:38 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 09:54:38 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Wed, 10 Nov 2021 16:57:14 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > > src/hotspot/share/runtime/deoptimization.cpp line 2695: > >> 2693: return 0; >> 2694: } >> 2695: > > Why do we need this? Is it a placeholder for a future enhancement? If so, a comment would at least be helpful. That's a tricky one :) It's needed to fix the Minimal/Zero builds. It's inside a the `#else` branch of a `#ifdef COMPILER2_OR_JVMCI` condition together with a bunch of other methods which have an empty body in the case we have no C2 or JVMCI. Could certainly be implemented more elegant but I decided to adhere to the current coding style in `deoptimization.cpp` :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 10:00:39 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 10:00:39 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Wed, 10 Nov 2021 17:06:06 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > > test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 78: > >> 76: private static final WhiteBox WB = WhiteBox.getWhiteBox(); >> 77: // Until JDK-8275908 is not fixed, null-pointer traps for invokes and array-store traps are not profiled in the interpreter. >> 78: private static final boolean JDK8275908_fixed = false; > > I don't know if that one should get fixed first, but I'm ok with your workaround. Would it make sense to add that bug id to this test's header? This PR is now open for so long time and I'd like to complete it without the dependency on another fix. But adding the bug id to the test is a good idea. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From nradomski at openjdk.java.net Thu Nov 11 10:16:58 2021 From: nradomski at openjdk.java.net (Niklas Radomski) Date: Thu, 11 Nov 2021 10:16:58 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le Message-ID: Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. ------------- Commit messages: - Port shenandoahgc to linux on ppc64le Changes: https://git.openjdk.java.net/jdk/pull/6325/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6325&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276927 Stats: 1526 lines in 8 files changed: 1524 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6325.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6325/head:pull/6325 PR: https://git.openjdk.java.net/jdk/pull/6325 From simonis at openjdk.java.net Thu Nov 11 10:28:04 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 10:28:04 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v7] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Minor enhancements and fixes requested by Martin ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/99db7e54..625da2f9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=05-06 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 10:28:08 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 10:28:08 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: On Thu, 4 Nov 2021 16:28:52 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. Hi Martin, thanks a lot for looking at my PR one more time. I've just pushed an updated version which should address all your points. Still anything missing? Best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From mdoerr at openjdk.java.net Thu Nov 11 10:54:38 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 10:54:38 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v7] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 10:28:04 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor enhancements and fixes requested by Martin Thanks for the updates. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5488 From mdoerr at openjdk.java.net Thu Nov 11 10:54:38 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 10:54:38 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v6] In-Reply-To: References: <0II1AGxTmud7vWbpWeKGm_vPr_yqFuVaauWEzBN_pMw=.f09fc7be-1833-431b-8353-161b9dad3cf4@github.com> Message-ID: <1EDY97O7mQZB96nsPoxILTsIaRRoiVmKWkOzq-2ANd8=.2498976c-b5bf-4391-b055-2acb46d99a79@github.com> On Thu, 11 Nov 2021 09:40:44 GMT, Volker Simonis wrote: >> src/hotspot/share/prims/whitebox.cpp line 987: >> >>> 985: bool overflow = false; >>> 986: for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { >>> 987: if (reason_str != NULL && !strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { >> >> Maybe the code would be better readable when checking `reason_str != NULL` first and then use 2 loops? Just a minor suggestion. Should only be done if readability is better. > > I've tried it but the resulting version is slightly longer and in my opinion not really more readable: > > WB_ENTRY(jint, WB_GetMethodTrapCount(JNIEnv* env, jobject o, jobject method, jstring reason_obj)) > jmethodID jmid = reflected_method_to_jmid(thread, env, method); > CHECK_JNI_EXCEPTION_(env, 0); > methodHandle mh(THREAD, Method::checked_resolve_jmethod_id(jmid)); > uint cnt = 0; > MethodData* mdo = mh->method_data(); > if (mdo != NULL) { > ResourceMark rm(THREAD); > if (reason_obj != NULL) { > char* reason_str = java_lang_String::as_utf8_string(JNIHandles::resolve_non_null(reason_obj)); > for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { > if (!strcmp(reason_str, Deoptimization::trap_reason_name(reason))) { > cnt = mdo->trap_count(reason); > // Count in the overflow trap count on overflow > if (cnt == (uint)-1) { > cnt = mdo->trap_count_limit() + mdo->overflow_trap_count(); > } > break; > } > } > } else { > bool overflow = false; > for (uint reason = 0; reason < mdo->trap_reason_limit(); reason++) { > uint c = mdo->trap_count(reason); > if (c == (uint)-1) { > c = mdo->trap_count_limit(); > if (!overflow) { > // Count overflow trap count just once > overflow = true; > c += mdo->overflow_trap_count(); > } > } > cnt += c; > } > } > } > return cnt; > WB_END > > > But for me it's actually no difference. Please just let me know if you'd still prefer the alternative version. > > PS: I've updated the documentation of the method which was inaccurate for `reason==NULL`. Your two loop version looks a bit easier to read for me, but that may be a matter of taste. I leave you free to decide. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From adinn at openjdk.java.net Thu Nov 11 11:21:37 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 11 Nov 2021 11:21:37 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 13:22:37 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify branch protection configure check > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5185: > >> 5183: // ROP Protection >> 5184: >> 5185: void MacroAssembler::protect_return_address() { > > We need proper, full, detailed comments about what these functions do, with reference to primary AArch64 documentation. As far as the AArch64 docs are concerned the relevant details are provided in ARM-ARM D - The PAC functionality is described in ARM-ARM Section D5.1.5 - Overview of the PAC instructions is provided in section C3.1.9 - Detailed PAC instruction descriptions are provided in C6.2.195 - C6.2.199 n.b. I am specifically referring to my (possibly out of date) copy ARM-DDI 0487D.a (ID103018) which is the Initial v8.4 EAC release from 2018. That said, I agree that a description of how these functions use the underlying PAC support and what, effectively, they achieve via that usage would be necessary. A reference to the relevant sections of the ARM doc in the code would be helpful. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Thu Nov 11 11:37:40 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 11 Nov 2021 11:37:40 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 11:19:03 GMT, Andrew Dinn wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5185: >> >>> 5183: // ROP Protection >>> 5184: >>> 5185: void MacroAssembler::protect_return_address() { >> >> We need proper, full, detailed comments about what these functions do, with reference to primary AArch64 documentation. > > As far as the AArch64 docs are concerned the relevant details are provided in ARM-ARM D > > - The PAC functionality is described in ARM-ARM Section D5.1.5 > - Overview of the PAC instructions is provided in section C3.1.9 > - Detailed PAC instruction descriptions are provided in C6.2.195 - C6.2.199 > > n.b. I am specifically referring to my (possibly out of date) copy ARM-DDI 0487D.a (ID103018) which is the Initial v8.4 EAC release from 2018. > > That said, I agree that a description of how these functions use the underlying PAC support and what, effectively, they achieve via that usage would be necessary. A reference to the relevant sections of the ARM doc in the code would be helpful. Correction: Using the most up to date ARM ARM G [ARM DDI 0487G.a (ID011921)] - The PAC functionality is described in ARM-ARM Section D5.1.5 - Overview of the PAC instructions is provided in section C3.1.10 - Detailed PAC instruction descriptions are provided in C6.2.208 - C6.2.212 ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Thu Nov 11 11:39:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 11:39:41 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15] In-Reply-To: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com> References: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com> Message-ID: On Thu, 11 Nov 2021 09:36:36 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `nop`: use `nop` instruction for spin pauses. >> - `isb`: use `isb` instruction for spin pauses. >> - `yield`: use `yield` instruction for spin pauses. >> >> And `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed >> >> CSR: https://bugs.openjdk.java.net/browse/JDK-8274564 > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > 8275728: Add simple Producer/Consumer microbenchmark for Thread.onSpinWait Marked as reviewed by aph (Reviewer). I'm getting this for `-XX:OnSpinWaitInst=yield` on Apple M1: Benchmark (maxNum) (spinNum) Score Error Units ThreadOnSpinWaitProducerConsumer.trial 100 125 355.686 ? 1.263 us/op This for `-XX:OnSpinWaitInst=none`: ThreadOnSpinWaitProducerConsumer.trial 100 125 359.635 ? 0.912 us/op This for `-XX:OnSpinWaitInst=isb`: ThreadOnSpinWaitProducerConsumer.trial 100 125 169.353 ? 3.932 us/op Which looks pretty convincing, at least for this benchmark. I'm a bit concerned that it took so much effort to find a convincing benchmark, but I note that OnSpinWaitInst=isb doesn't seem to make anything worse, so OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From duke at openjdk.java.net Thu Nov 11 11:47:36 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 11 Nov 2021 11:47:36 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 11:34:09 GMT, Andrew Dinn wrote: >> As far as the AArch64 docs are concerned the relevant details are provided in ARM-ARM D >> >> - The PAC functionality is described in ARM-ARM Section D5.1.5 >> - Overview of the PAC instructions is provided in section C3.1.9 >> - Detailed PAC instruction descriptions are provided in C6.2.195 - C6.2.199 >> >> n.b. I am specifically referring to my (possibly out of date) copy ARM-DDI 0487D.a (ID103018) which is the Initial v8.4 EAC release from 2018. >> >> That said, I agree that a description of how these functions use the underlying PAC support and what, effectively, they achieve via that usage would be necessary. A reference to the relevant sections of the ARM doc in the code would be helpful. > > Correction: > Using the most up to date ARM ARM G [ARM DDI 0487G.a (ID011921)] > > - The PAC functionality is described in ARM-ARM Section D5.1.5 > - Overview of the PAC instructions is provided in section C3.1.10 > - Detailed PAC instruction descriptions are provided in C6.2.208 - C6.2.212 I'm thinking for references to the Arm Arm to use header titles instead of section numbers, as the titles should be more stable. Also probably need some description around the code in the pauth_aarch64.hpp too. But I want to make sure I'm not duplicating comments - maybe the macroassembler comments should point to the pauth_aarch64 comments. It didn't seen common in the code to describe instruction functionality, which is why I didn't add any. Agreed it needs something added though. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Thu Nov 11 11:55:34 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 11:55:34 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: <_9P-UvEbKu8NkBMq_pPr-_-muZxxHfwa2vV7h9nq6ZQ=.11e2de58-4e9e-4364-bfc9-c17791a19933@github.com> On Thu, 11 Nov 2021 11:44:09 GMT, Alan Hayward wrote: >> Correction: >> Using the most up to date ARM ARM G [ARM DDI 0487G.a (ID011921)] >> >> - The PAC functionality is described in ARM-ARM Section D5.1.5 >> - Overview of the PAC instructions is provided in section C3.1.10 >> - Detailed PAC instruction descriptions are provided in C6.2.208 - C6.2.212 > > I'm thinking for references to the Arm Arm to use header titles instead of section numbers, as the titles should be more stable. > > Also probably need some description around the code in the pauth_aarch64.hpp too. But I want to make sure I'm not duplicating comments - maybe the macroassembler comments should point to the pauth_aarch64 comments. > > It didn't seen common in the code to describe instruction functionality, which is why I didn't add any. Agreed it needs something added though. Yeah. At the definitions of `authenticate_return_address()` et al you can say what you expect in the normal case and what you expect when you've been hacked, along with an overview. I realize that it was a bit tricky to make this work with HotSpot because we're synthesizing return addresses just like hackers do, so a comment where we're patching return addresses would be nice. As long as the instructions are easily findable in the docs that's good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Thu Nov 11 11:59:35 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 11:59:35 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: <_9P-UvEbKu8NkBMq_pPr-_-muZxxHfwa2vV7h9nq6ZQ=.11e2de58-4e9e-4364-bfc9-c17791a19933@github.com> References: <_9P-UvEbKu8NkBMq_pPr-_-muZxxHfwa2vV7h9nq6ZQ=.11e2de58-4e9e-4364-bfc9-c17791a19933@github.com> Message-ID: On Thu, 11 Nov 2021 11:52:46 GMT, Andrew Haley wrote: >> I'm thinking for references to the Arm Arm to use header titles instead of section numbers, as the titles should be more stable. >> >> Also probably need some description around the code in the pauth_aarch64.hpp too. But I want to make sure I'm not duplicating comments - maybe the macroassembler comments should point to the pauth_aarch64 comments. >> >> It didn't seen common in the code to describe instruction functionality, which is why I didn't add any. Agreed it needs something added though. > > Yeah. At the definitions of `authenticate_return_address()` et al you can say what you expect in the normal case and what you expect when you've been hacked, along with an overview. I realize that it was a bit tricky to make this work with HotSpot because we're synthesizing return addresses just like hackers do, so a comment where we're patching return addresses would be nice. > > As long as the instructions are easily findable in the docs that's good. Just to be clear: no, don't describe instructions. describe what the macros do, and when to use them. Imagine that you, the reader can't see the contents of the macro at all, just the name and the comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Nov 11 12:02:41 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 11 Nov 2021 12:02:41 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v15] In-Reply-To: References: <9vTWqXSA_S6TE9iMxqpDBY4kj9smBnxMhE2hBna2V2Q=.2457f83a-c3b3-4fb5-b919-a4fdb843d8b7@github.com> Message-ID: On Thu, 11 Nov 2021 11:35:17 GMT, Andrew Haley wrote: > I'm a bit concerned that it took so much effort to find a convincing benchmark, but I note that OnSpinWaitInst=isb doesn't seem to make anything worse, so OK. Thank you Andrew. It took the time to study the current use cases of `Thread.onSpinWait` why they got performance improved or did not. As usual when you have written something simple you need to check it is correct. All of these took most of the time. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From ihse at openjdk.java.net Thu Nov 11 12:05:43 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 11 Nov 2021 12:05:43 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 08:48:07 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Simplify branch protection configure check Build changes look much better now, thanks! Build part approved; the actual code changes needs approval from others. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6334 From mdoerr at openjdk.java.net Thu Nov 11 12:06:34 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 12:06:34 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore wrote: > This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. > Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. Thanks for taking care of all platforms. `move_ptr(MacroAssembler*, VMRegPair, VMRegPair, int)` needs to get removed to avoid build warnings on PPC64 and s390. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6343 From rkennke at openjdk.java.net Thu Nov 11 12:10:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 11 Nov 2021 12:10:41 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski wrote: > Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. Hi Niklas, thanks for this awesome work! I can't really comment on the actual PPC code, so this needs to be reviewed by somebody else. Structurally the change looks correct. I have one comment about the C1 CAS barrier code, but it's minor. Thanks & cheers, Roman src/hotspot/cpu/ppc/gc/shenandoah/c1/shenandoahBarrierSetC1_ppc.cpp line 83: > 81: LIRGenerator* gen = access.gen(); > 82: > 83: if (ShenandoahCASBarrier) { I am not sure, but I almost think we should not even end up in the method with -ShenandoahCASBarrier. If anything, -ShenandoahCASBarrier should result in only calling super to emit regular CAS without any barriers. ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6325 From ihse at openjdk.java.net Thu Nov 11 12:21:32 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 11 Nov 2021 12:21:32 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski wrote: > Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. Build changes look good. Actual code changes needs to be reviewed by someone more knowledgable about this area. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6325 From mcimadamore at openjdk.java.net Thu Nov 11 13:03:56 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Thu, 11 Nov 2021 13:03:56 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v23] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Merge branch 'master' into JEP-419 - Revert removal of upcall MH customization (This change caused spurious VM crashes, so reverting to baseline) - Further tweak upcall safety considerations - Clarify safety considerations for upcalls - Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress (which is consistent with other restricted factories in VaList and NativeSymbol) - Streamline javadoc for package-info - * Add two new CLinker static methods to compute upcall/downcall method types * Clarify section on CLinker downcall type * Add section on CLinker safety guarantees - Fix TestUpcall * reverse() has a bug, as it doesn't tweak parameter types * reverse() is applied to the wrong MH - Make ArenaAllocator impl more flexible in the face of OOME An ArenaAllocator should remain open for business, even if OOME is thrown in case other allocations can fit the arena size. - Simplify ArenaAllocator impl. The arena should respect its boundaries and never allocate more memory than its size specifies. - ... and 22 more: https://git.openjdk.java.net/jdk/compare/aea09677...8c3860f8 ------------- Changes: https://git.openjdk.java.net/jdk/pull/5907/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=22 Stats: 14686 lines in 193 files changed: 6956 ins; 5120 del; 2610 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From coleenp at openjdk.java.net Thu Nov 11 13:35:34 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 13:35:34 GMT Subject: RFR: 8276658: Clean up JNI local handles code In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 06:35:59 GMT, David Holmes wrote: >> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. >> Move the fields to JavaThread and adding JavaThread* argument. >> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. >> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. >> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. >> The commits in this change have been performance tested individually and together with no meaningful differences from mainline. > > src/hotspot/share/runtime/vmThread.hpp line 63: > >> 61: class VMThread: public NamedThread { >> 62: private: >> 63: volatile bool _is_running; > > I don't see this being initialized to false. Good catch! ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From coleenp at openjdk.java.net Thu Nov 11 13:39:35 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 13:39:35 GMT Subject: RFR: 8276658: Clean up JNI local handles code In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 06:52:45 GMT, David Holmes wrote: >> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. >> Move the fields to JavaThread and adding JavaThread* argument. >> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. >> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. >> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. >> The commits in this change have been performance tested individually and together with no meaningful differences from mainline. > > src/hotspot/share/compiler/compileBroker.cpp line 2324: > >> 2322: // Remove the JNI handle block after the ciEnv destructor has run in >> 2323: // the previous block. >> 2324: pop_jni_handle_block(); > > Does the fact the JNIHandleMark destructor won't get executed until much later, at the end of this method, make any difference? I don't think so because most of the rest of the function is logging and it doesn't seem to do anything with JNIHandles afterwards, so there are no handles created that shouldn't be removed in that code range. ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From chagedorn at openjdk.java.net Thu Nov 11 13:52:37 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 11 Nov 2021 13:52:37 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long wrote: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java src/hotspot/share/ci/ciReplay.cpp line 118: > 116: bool _protection_domain_initialized; > 117: Handle _loader; > 118: int _version; You forgot to initialize `_version` to 0. Otherwise, it could contain garbage for old replay files without version number (possibly `> REPLAY_VERSION`). ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From coleenp at openjdk.java.net Thu Nov 11 13:58:06 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 13:58:06 GMT Subject: RFR: 8276658: Clean up JNI local handles code [v2] In-Reply-To: References: Message-ID: > JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. > Move the fields to JavaThread and adding JavaThread* argument. > Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. > Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. > The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. > The commits in this change have been performance tested individually and together with no meaningful differences from mainline. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add _is_running initialization. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6336/files - new: https://git.openjdk.java.net/jdk/pull/6336/files/239e9246..f31dfeee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6336.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6336/head:pull/6336 PR: https://git.openjdk.java.net/jdk/pull/6336 From coleenp at openjdk.java.net Thu Nov 11 13:58:07 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 13:58:07 GMT Subject: RFR: 8276658: Clean up JNI local handles code In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 17:16:29 GMT, Coleen Phillimore wrote: > JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. > Move the fields to JavaThread and adding JavaThread* argument. > Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. > Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. > The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. > The commits in this change have been performance tested individually and together with no meaningful differences from mainline. Thank you for the code review, David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From coleenp at openjdk.java.net Thu Nov 11 14:12:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 14:12:41 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com> On Thu, 11 Nov 2021 07:16:27 GMT, Aleksey Shipilev wrote: >> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. >> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. > > src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1746: > >> 1744: // NW [ABI_REG_ARGS] <-- 1) R1_SP >> 1745: // [outgoing arguments] <-- 2) R1_SP + out_arg_slot_offset >> 1746: // [oopHandle area] <-- 3) R1_SP + oop_handle_offset (save area for critical natives) ? > > `?`. The comment `(save area for critical natives)` must be redundant now. I didn't know if the save area is still needed for something else, which is why I left the ?. I can remove the comment but haven't made any substantial changes here. I'm not sure if they're needed or not, but I can't test them if I made them. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From coleenp at openjdk.java.net Thu Nov 11 14:22:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 14:22:41 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore wrote: > This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. > Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. Thanks for reviewing, Aleksey. I made the changes and will retest. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From shade at openjdk.java.net Thu Nov 11 14:22:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Nov 2021 14:22:42 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com> References: <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com> Message-ID: On Thu, 11 Nov 2021 14:09:12 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1746: >> >>> 1744: // NW [ABI_REG_ARGS] <-- 1) R1_SP >>> 1745: // [outgoing arguments] <-- 2) R1_SP + out_arg_slot_offset >>> 1746: // [oopHandle area] <-- 3) R1_SP + oop_handle_offset (save area for critical natives) ? >> >> `?`. The comment `(save area for critical natives)` must be redundant now. > > I didn't know if the save area is still needed for something else, which is why I left the ?. I can remove the comment but haven't made any substantial changes here. I'm not sure if they're needed or not, but I can't test them if I made them. I mean, you did the same here: https://github.com/openjdk/jdk/pull/6343/files#diff-060e534de775616a893aa969f3639e53666cda9e93bed7c3a3c14b9cdc4cdba0L1553-L1554 -- and that change is understandable. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From coleenp at openjdk.java.net Thu Nov 11 14:22:45 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 14:22:45 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 07:19:57 GMT, Aleksey Shipilev wrote: >> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. >> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1551: > >> 1549: int total_c_args = total_in_args+1; >> 1550: if (method->is_static()) { >> 1551: total_c_args++; > > In this patch, sometimes we keep the if structure, like here, but in other places, we replace this with: > > int total_c_args = total_in_args + (method->is_static() ? 2 : 1) > > Should probably stick with a single style. Ok, that's a good suggestion. Some platforms have a method_is_static boolean and some don't, so I didn't clean up the platforms that do that later in a different way (or inconsistently). > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1793: > >> 1791: int c_arg = arg_order.at(ai + 1); >> 1792: __ block_comment(err_msg("move %d -> %d", i, c_arg)); >> 1793: assert (c_arg != -1, "wrong direction"); > > `assert (c_arg != -1 && i != -1, "wrong direction");`? removed. > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1905: > >> 1903: } else { >> 1904: // Compute a valid move order, using tmp_vmreg to break any cycles >> 1905: ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg); > > `ComputeMoveOrder` is still used somewhere, or? Yes, it's used in cpu/x86/universalUpcallHandler_x86_64.cpp: SharedRuntime::compute_move_order(in_sig_bt, > src/hotspot/share/runtime/sharedRuntime.cpp line 3019: > >> 3017: if (CriticalJNINatives && !method->is_method_handle_intrinsic()) { >> 3018: // We perform the I/O with transition to native before acquiring AdapterHandlerLibrary_lock. >> 3019: critical_entry = NativeLookup::lookup_critical_entry(method); > > `critical_entry` variable is now redundant? removed, thanks for spotting that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From fweimer at openjdk.java.net Thu Nov 11 14:23:43 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Thu, 11 Nov 2021 14:23:43 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 08:48:07 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Simplify branch protection configure check Is the code still mapped read-write all the time? src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115: > 113: range(-1, 4096) \ > 114: product(bool, UseROPProtection, false, \ > 115: "Use ROP based branch protection") \ The description is not correct. It's protection against certain ROP-based attack techniques. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From coleenp at openjdk.java.net Thu Nov 11 14:25:45 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 14:25:45 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 12:03:55 GMT, Martin Doerr wrote: >> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. >> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. > > Thanks for taking care of all platforms. `move_ptr(MacroAssembler*, VMRegPair, VMRegPair, int)` needs to get removed to avoid build warnings on PPC64 and s390. Thanks for finding move_ptr @TheRealMDoerr. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From mdoerr at openjdk.java.net Thu Nov 11 14:34:39 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 14:34:39 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski wrote: > Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. Nice work! Looks correct. For others: Note that this change already contains feedback from my offline review. src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 74: > 72: // IU barriers are also employed to avoid resurrection of weak references, > 73: // even if Shenandoah does not operate in incremental update mode. > 74: if (ShenandoahIUBarrier || ShenandoahSATBBarrier) { Sharing the code for IU and SATB sounds like a good idea, but one needs to be careful. `ShenandoahBarrierSetC1::iu_barrier` only works with `ShenandoahIUBarrier`, so this trick can't be used in C1. It's a bit confusing, but I'm ok with this version. At least, I don't have any better suggestion at the moment. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6325 From mdoerr at openjdk.java.net Thu Nov 11 14:34:40 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 14:34:40 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 11:32:49 GMT, Roman Kennke wrote: >> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. > > src/hotspot/cpu/ppc/gc/shenandoah/c1/shenandoahBarrierSetC1_ppc.cpp line 83: > >> 81: LIRGenerator* gen = access.gen(); >> 82: >> 83: if (ShenandoahCASBarrier) { > > I am not sure, but I almost think we should not even end up in the method with -ShenandoahCASBarrier. If anything, -ShenandoahCASBarrier should result in only calling super to emit regular CAS without any barriers. We hit this case when running `jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive -version`. x86 and aarch64 check for ShenandoahCASBarrier, too. So, looks like these checks are needed and correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/6325 From ihse at openjdk.java.net Thu Nov 11 14:38:49 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Thu, 11 Nov 2021 14:38:49 GMT Subject: RFR: 8277012: Use blessed modifier order in src/utils Message-ID: I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound. There are no clear ownership of this code, but I believe it's kind of hotspot-related. ------------- Commit messages: - 8277012: Use blessed modifier order in src/utils Changes: https://git.openjdk.java.net/jdk/pull/6354/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6354&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277012 Stats: 25 lines in 10 files changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/6354.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6354/head:pull/6354 PR: https://git.openjdk.java.net/jdk/pull/6354 From adinn at openjdk.java.net Thu Nov 11 14:46:43 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 11 Nov 2021 14:46:43 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:20:20 GMT, Florian Weimer wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify branch protection configure check > > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115: > >> 113: range(-1, 4096) \ >> 114: product(bool, UseROPProtection, false, \ >> 115: "Use ROP based branch protection") \ > > The description is not correct. It's protection against certain ROP-based attack techniques. I don't agree that this is incorrect, at least not for the stated reason. The flag switches on a protection mechanism that guards against ROP attacks. To my reading that does not imply it guards against all such attacks, merely that this is the nature of the protection it offers. The description might still be considered incorrect for an unrelated reason. Its use of the adjectival phrase ROP based constitutes a transferred epithet, conflating the symptom with the medicine. In other words, the protection offered is not ROP based i.e. does not rely on an ROP technique. What it does is protect against ROP attacks. So, I'd suggest rewording to "Enable protection of branches against ROP attacks". Florian, if you want to argue for rewording that to "Enable protection of branches against some categories of ROP attacks" or some other equivalently qualified variant please feel free to make a case. However, I don't think see any need to add that rider, nor any precedent in any of the other short descriptions provided in globals.hpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Thu Nov 11 14:56:46 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 11 Nov 2021 14:56:46 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:20:33 GMT, Florian Weimer wrote: > Is the code still mapped read-write all the time? That depends on what code you mean. The JVM code compiled from C++ sources is mapped RO(X) in the text section like any compiled C/C++ code. Protection of that code is covered by the changes to the build system. The runtime generated runtime stubs and Java method code into which this patch may insert the required PAC instructions are written into a code cache in a section which is mapped RW(X) all the time. It would be hard to map even a subset of this code cache RO because generated code includes call and data sites that need to be patched during execution. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From fweimer at openjdk.java.net Thu Nov 11 14:56:46 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Thu, 11 Nov 2021 14:56:46 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:43:59 GMT, Andrew Dinn wrote: >> src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115: >> >>> 113: range(-1, 4096) \ >>> 114: product(bool, UseROPProtection, false, \ >>> 115: "Use ROP based branch protection") \ >> >> The description is not correct. It's protection against certain ROP-based attack techniques. > > I don't agree that this is incorrect, at least not for the stated reason. The flag switches on a protection mechanism that guards against ROP attacks. To my reading that does not imply it guards against all such attacks, merely that this is the nature of the protection it offers. > > The description might still be considered incorrect for an unrelated reason. Its use of the adjectival phrase ROP based constitutes a transferred epithet, conflating the symptom with the medicine. In other words, the protection offered is not ROP based i.e. does not rely on an ROP technique. What it does is protect against ROP attacks. So, I'd suggest rewording to > > "Enable protection of branches against ROP attacks". > > Florian, if you want to argue for rewording that to "Enable protection of branches against some categories of ROP attacks" or some other equivalently qualified variant please feel free to make a case. However, I don't think see any need to add that rider, nor any precedent in any of the other short descriptions provided in globals.hpp. I did mean the description, not the flag name. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Thu Nov 11 15:02:39 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 11 Nov 2021 15:02:39 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:53:54 GMT, Florian Weimer wrote: >> I don't agree that this is incorrect, at least not for the stated reason. The flag switches on a protection mechanism that guards against ROP attacks. To my reading that does not imply it guards against all such attacks, merely that this is the nature of the protection it offers. >> >> The description might still be considered incorrect for an unrelated reason. Its use of the adjectival phrase ROP based constitutes a transferred epithet, conflating the symptom with the medicine. In other words, the protection offered is not ROP based i.e. does not rely on an ROP technique. What it does is protect against ROP attacks. So, I'd suggest rewording to >> >> "Enable protection of branches against ROP attacks". >> >> Florian, if you want to argue for rewording that to "Enable protection of branches against some categories of ROP attacks" or some other equivalently qualified variant please feel free to make a case. However, I don't think see any need to add that rider, nor any precedent in any of the other short descriptions provided in globals.hpp. > > I did mean the description, not the flag name. Yes, understood. I too was talking about the description even though I introduced my comment by talking about what the flag does. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From rkennke at openjdk.java.net Thu Nov 11 15:04:36 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 11 Nov 2021 15:04:36 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:30:05 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/gc/shenandoah/c1/shenandoahBarrierSetC1_ppc.cpp line 83: >> >>> 81: LIRGenerator* gen = access.gen(); >>> 82: >>> 83: if (ShenandoahCASBarrier) { >> >> I am not sure, but I almost think we should not even end up in the method with -ShenandoahCASBarrier. If anything, -ShenandoahCASBarrier should result in only calling super to emit regular CAS without any barriers. > > We hit this case when running `jdk/bin/java -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive -version`. x86 and aarch64 check for ShenandoahCASBarrier, too. So, looks like these checks are needed and correct. Ok then. ------------- PR: https://git.openjdk.java.net/jdk/pull/6325 From duke at openjdk.java.net Thu Nov 11 15:33:33 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 11 Nov 2021 15:33:33 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:52:54 GMT, Andrew Dinn wrote: > The runtime generated runtime stubs and Java method code into which this patch may insert the required PAC instructions are written into a code cache in a section which is mapped RW(X) all the time. It would be hard to map even a subset of this code cache RO because generated code includes call and data sites that need to be patched during execution. Am I right is saying that for Macos, all generated code is remapped RO before execution? An additional concern I have is that if the globals data was attacked then the UseROPProtection flag could be flipped, and all code after that point would be generated without ROP protection. Marking all the globals data as RO would fix that. Alternatively remove UseROPProtection and then in the macroassembler always generate PAC code, using just the subset of instructions that are NOPs on non-PAC hardware. Or alternatively only generate PAC code based on a #define set at build time. Each option has its own downsides. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From coleenp at openjdk.java.net Thu Nov 11 15:56:33 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 15:56:33 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: <2YkKB7ZvIjfaTezJh_BSpwf1PdBH4GEW7Nnt43yDphU=.e37b5f9a-258b-4a1b-9de9-b7c0933535ec@github.com> Message-ID: On Thu, 11 Nov 2021 14:17:15 GMT, Aleksey Shipilev wrote: >> I didn't know if the save area is still needed for something else, which is why I left the ?. I can remove the comment but haven't made any substantial changes here. I'm not sure if they're needed or not, but I can't test them if I made them. > > I mean, you did the same here: https://github.com/openjdk/jdk/pull/6343/files#diff-060e534de775616a893aa969f3639e53666cda9e93bed7c3a3c14b9cdc4cdba0L1553-L1554 -- and that change is understandable. Looking further, the area is needed (stores oops there), but not the comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From coleenp at openjdk.java.net Thu Nov 11 16:19:01 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Nov 2021 16:19:01 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: References: Message-ID: > This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. > Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Some platform adjustments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6343/files - new: https://git.openjdk.java.net/jdk/pull/6343/files/7e9c641d..9b8ff9ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=00-01 Stats: 67 lines in 6 files changed: 0 ins; 57 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/6343.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6343/head:pull/6343 PR: https://git.openjdk.java.net/jdk/pull/6343 From adinn at openjdk.java.net Thu Nov 11 16:34:33 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 11 Nov 2021 16:34:33 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 15:30:29 GMT, Alan Hayward wrote: > Am I right is saying that for Macos, all generated code is remapped RO before execution? Ah, no, it seems the code cache is not RWX all the time as far as Java threads are concerned. The Macos/AArch64 code is strategically calling pthread_jit_write_protect_np at Java <-> JVM transition points. That ensures that executable regions are executable but not writable (RX) from a Java thread when running JITted Java code and are writable but not executable (RW) when it calls into JVM code. > An additional concern I have is that if the globals data was attacked then the UseROPProtection flag could be flipped, and all code after that point would be generated without ROP protection. Marking all the globals data as RO would fix that. Alternatively remove UseROPProtection and then in the macroassembler always generate PAC code, using just the subset of instructions that are NOPs on non-PAC hardware. Or alternatively only generate PAC code based on a #define set at build time. Each option has its own downsides. Globals data can legitimately be written during JVM startup (perhaps in some cases also during execution?). So, they cannot simply be marked as RO. I am not sure this concern is really warranted. If an attacker is already able to overwrite UseROPProtection then a concern over the resulting omission of JITted ROP protection seems like attending to the loud banging of the stable door while Shergar has already been diced into stew meat. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From mdoerr at openjdk.java.net Thu Nov 11 16:35:32 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 16:35:32 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski wrote: > Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 536: > 534: if (!preserve_gp_registers) { __ clobber_volatile_gprs(dst); } > 535: if (!needs_frame) { __ clobber_carg_stack_slots(tmp1); } > 536: #endif This clobber code was certainly good during development and early testing. But is it worth keeping it? Other GCs and other places don't have it any more. So, I'd slightly prefer removal. Feel free to do so if you agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/6325 From simonis at openjdk.java.net Thu Nov 11 16:43:14 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 16:43:14 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix build issue for minimal/zero build one more time ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/625da2f9..b3c130c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From aph at openjdk.java.net Thu Nov 11 16:51:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 16:51:39 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:59:32 GMT, Andrew Dinn wrote: >> I did mean the description, not the flag name. > > Yes, understood. I too was talking about the description even though I introduced my comment by talking about what the flag does. `"Protect branches against ROP attacks".` ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From kvn at openjdk.java.net Thu Nov 11 16:58:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 16:58:39 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long wrote: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java src/hotspot/share/ci/ciReplay.cpp line 645: > 643: _version = parse_int("version"); > 644: if (_version > REPLAY_VERSION) { > 645: report_error("unrecognized version"); Would be nice to print both versions numbers in error message. Also I would like to be able ignore such error and process file anyway. Is `report_error` allows it? ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From kvn at openjdk.java.net Thu Nov 11 17:02:38 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 17:02:38 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long wrote: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java src/hotspot/share/ci/ciReplay.cpp line 837: > 835: rec->_state = parse_int("state"); > 836: if (_version < 1) { > 837: parse_int("current_mileage"); Why it is not assigned to `rec->_current_mileage` here? ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From psandoz at openjdk.java.net Thu Nov 11 17:10:04 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 11 Nov 2021 17:10:04 GMT Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator) [v9] In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> Message-ID: > This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE. > > On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend. > > Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work. > > No API enhancements were required and only a few additional tests were needed. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Add missing null check post mask unboxing. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5873/files - new: https://git.openjdk.java.net/jdk/pull/5873/files/571e6f39..11906870 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=07-08 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5873.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5873/head:pull/5873 PR: https://git.openjdk.java.net/jdk/pull/5873 From kvn at openjdk.java.net Thu Nov 11 17:19:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 17:19:37 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: <5Q-g54nyWmdaykYA01MSSN5yGS2qoAqnXPtxhyD12fU=.103ffc2e-dc83-4da4-876a-f05735feda1b@github.com> On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time I suggest to not rush it and wait JDK 19 because 18 is almost done. I wanted to look on this too but I am on vacation. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 17:35:47 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 17:35:47 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time Hi Vladimir, I'd be really happy if you could take a look at this PR. On the other hand, I did intend to bring this to JDK 18. There's still a month until RDP 1 starts and this PR has already been discussed for two month. If you say "don't hurry" does that mean that you won't have time to review it within the next month? Best regards and a pleasant vacation, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From kvn at openjdk.java.net Thu Nov 11 17:50:36 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Nov 2021 17:50:36 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: <-qMqxY4R-0pRP7L0xxagLIrk-wW0XeLo7g13c3GQ8uk=.1d4c438a-e0dc-4a5d-a447-d4d2130bc9fc@github.com> On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time My vacation is just started and I will have just a week before RDP1 to do review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Nov 11 17:50:37 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Nov 2021 17:50:37 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v8] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:43:14 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix build issue for minimal/zero build one more time OK, enjoy your vacation then... ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From aph at openjdk.java.net Thu Nov 11 18:10:35 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Nov 2021 18:10:35 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:31:41 GMT, Andrew Dinn wrote: > > Am I right is saying that for Macos, all generated code is remapped RO before execution? > > Ah, no, it seems the code cache is not RWX all the time as far as Java threads are concerned. The Macos/AArch64 code is strategically calling pthread_jit_write_protect_np at Java <-> JVM transition points. And this requires magic kernel support. I did mention it to a kernel engineer who wasn't very impressed, but I think it's pretty cool. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From fweimer at openjdk.java.net Thu Nov 11 18:18:41 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Thu, 11 Nov 2021 18:18:41 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: <3ViGybkSVRbuD_wN398vEFGxNJfiuS1wA_SdLkGtM18=.86e45177-8525-42dc-b27f-c22a67489108@github.com> On Thu, 11 Nov 2021 18:07:37 GMT, Andrew Haley wrote: > > > Am I right is saying that for Macos, all generated code is remapped RO before execution? > > > > > > Ah, no, it seems the code cache is not RWX all the time as far as Java threads are concerned. The Macos/AArch64 code is strategically calling pthread_jit_write_protect_np at Java <-> JVM transition points. > > And this requires magic kernel support. I did mention it to a kernel engineer who wasn't very impressed, but I think it's pretty cool. It's possible to emulate this to some extent with memory protection keys on POWER and (recent) x86. See `pkey_alloc`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From vladimir.kozlov at oracle.com Thu Nov 11 20:37:26 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Nov 2021 12:37:26 -0800 Subject: [External] : Re: RFC - Improving C2 Escape Analysis In-Reply-To: References: <20210930140335.648146897@eggemoggin.niobe.net> <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com> <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com> <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com> <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com> <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com> <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> Message-ID: <4bb3c804-d9fd-b9da-a4d3-c504d2e46933@oracle.com> Hi Cesar, On 11/11/21 11:24 AM, Cesar Soares Lucas wrote: > Hi Vladimir, > > Thank you for the feedback and sorry for the delay in getting back to you! > > > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some > > time investigating possible solutions for it but "no cigar". May be we do > > indead need control flow analysis to resolve this. > > Can you elaborate a bit on the approaches you tried and why you didn't like > them? By allocation merges do you mean nested objects like "obj1.obj2.x", > right? Did you try solving both control-flow merge issues and also allocation > merges? I mean control flow merges of allocations, like in your "Code Example 4". I tried to create separate unique instance IDs (in addition to Node::_idx) to use for merged allocations case (not NULL case) which would look like one allocation after merge point with different paths for fields initialization. But stumbles on some issues and did not proceed further. After some thinking I decided that it is wrong approach since it still don't solve main merge issue of flow-insensitive analysis: https://bugs.openjdk.java.net/browse/JDK-6726999 test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java The issue with deep nested allocations `new A(new B( new C)))` will be addressed by Iterative EA I propose: https://bugs.openjdk.java.net/browse/JDK-8276455 > > > There are 2 test files with small methods for different EA cases I used to > > see how EA works: > > These examples are being very helpful, thank you again! > > > Yes, I think it would be good to have a prototype if you are comfortable to > > work with C2 code already. ?I proposed small RFEs just for warmup ;) > > I talked with my colleagues and we decided to start the work by trying to fix > the control/data-flow merge issues - *perhaps not for all cases, but at least > for some of them*. Then, based on our experience with this and some > benchmarking we'll decide if we really need flow-sensitive analysis and how to > best approach that. Use Test6726999.java for that. It may need to be modified to verify correctness of results (currently it just print result). > > We'll definitely take a look at the RFEs as we move along! Implementing Stadler > algorithm was just something that crossed my mind initially, it's very likely > the last approach we'd try ... I don't want to bite more than I can chew.. I may look on some RFE myself after I am done with 8276455. Please, let me know if you pick one to avoid duplicated work. Regards, Vladimir K > > > Regards, > Cesar > ------------------------------------------------------------------------------------------------------------------------ > *From:* Vladimir Kozlov > *Sent:* October 29, 2021 5:27 PM > *To:* Cesar Soares Lucas ; Tobias Hartmann ; Ron Pressler > > *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net > ; Brian Stafford ; Martijn Verburg > ; Hohensee, Paul > *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis > On 10/29/21 4:50 PM, Cesar Soares Lucas wrote: >> Hi Vladimir and Tobias, >> >>? >> Sure, here are four examples of EA and/or scalarization failing due to >>? >> complicated control/data flow: >>? >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&reserved=0 > >> >>? >> There are 2 test files with small methods for different EA cases I used to >>? >> see how EA works: >>? >> >>? >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >>? >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >> >> Thank you for the examples, Tobias/Vladimir. This is being very helpful. >> >>? >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent >>? >> some time investigating possible solutions for it but "no cigar". May be we >>? >> do indead need control flow analysis to resolve this. >> >> By "need control flow analysis" you mean the flow-sensitive EA algorithm? My > > Yes. > > To clarify. I investigated solutions in current flow-insensitive EA. > >> first idea to handle these control/data-merge issues was to implement in C2 the >> same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al >> PEA paper. Do you think this is reasonable? > > Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already. > I proposed small RFEs just for warmup ;) > >> >>? >> I am currently looking on iterative EA. Do more EA rounds if we can >>? >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and >>? >> I have working prototype. >> >> Cool! I'm curious, when do you plan to submit a Pull Request for this? > > I am investigating regressions in some benchmarks. > >> >>? >> There is also suggestion from Amazon Java group about "C2 Partial Escape >>? >> Analysis" which needs more discussion: >>? >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&reserved=0 > >> >> I'd love to hear from them about their experience with these issues and if they >> have any plans to work on this moving forward! I'll ping them on the thread >> that you linked above. > > Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear > any additional information after Vladimir Ivanov replied. > > Regards, > Vladimir K > >> >> >> Regards, >> Cesar >> ------------------------------------------------------------------------------------------------------------------------ >> *From:* Vladimir Kozlov >> *Sent:* October 27, 2021 10:26 AM >> *To:* Tobias Hartmann ; Cesar Soares Lucas ; Ron Pressler >> >> *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net >> ; Brian Stafford ; Martijn Verburg >> >> *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis >> First. Thank you, Cesar, for collecting data about C2 EA shortcomings. >> >> I agree with cases Tobias pointed as possible starting points to improve EA. >> >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for >> it but "no cigar". May be we do indead need control flow analysis to resolve this. >> >> I looked through JBS and found few issues which are not required to write new EA: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&reserved=0 > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&reserved=0 > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&reserved=0 > > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&reserved=0 > > >> > >> >> Tobias also has fix prototype for next bug which was not fixed yet: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&reserved=0 > > >> > >> >> Ther are 2 test files with small methods for different EA cases I used to see how EA works: >> >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java >> >> You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for >> known merge issue: >> >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&reserved=0 > > >> > >> >> I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was >> proposed by Vladimir Ivanov and I have working prototype. >> >> There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion: >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&reserved=0 > > >> > >> >> Thanks, >> Vladimir K >> >> On 10/27/21 3:04 AM, Tobias Hartmann wrote: >>> Hi Cesar, >>> >>> On 27.10.21 08:20, Cesar Soares Lucas wrote: >>>> Right. I was suspecting this to be the most critical issue indeed. However, I >>>> didn't know there was a case where "... the object does not escape on any paths >>>> but control flow is too complicated for EA to prove that." Is this an issue >>>> tracked in JBS or perhaps you can show me an example where this happens? >>> >>> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data >>> flow: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&reserved=0 > > >> > >>> >>> All examples would completely fold with inline types (Valhalla). >>> >>> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some >>> of the issues you already described. >>> >>> Best regards, >>> Tobias >>> From psandoz at openjdk.java.net Thu Nov 11 21:44:53 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 11 Nov 2021 21:44:53 GMT Subject: RFR: 8271515: Integration of JEP 417: Vector API (Third Incubator) [v10] In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> Message-ID: > This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE. > > On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend. > > Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work. > > No API enhancements were required and only a few additional tests were needed. Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8271515-vector-api - Add missing null check post mask unboxing. - Merge pull request #2 from nsjian/vector-conversion-fix AArch64: Incorrect SVE double to int and float to long vector conversion - Incorrect double to int and float to long vector conversion Like JDK-8276151, SVE vector double to int and float to long conversions have similar issue. According to Java language specification [1], we should convert double/float to integer/long directly, instead of converting to long/int and then narrowing/extending to target types. Test cases will be updated in JDK-8276151. [1] https://docs.oracle.com/javase/specs/jls/se17/html/jls-5.html#jls-5.1.3 - Merge branch 'master' into JDK-8271515-vector-api - Merge pull request #1 from nsjian/JDK-8271515 Address AArch64 review comments from Nick. - Address review comments from Nick. - Merge branch 'master' into JDK-8271515-vector-api - Resolve review comments. - Merge branch 'master' into JDK-8271515-vector-api - ... and 6 more: https://git.openjdk.java.net/jdk/compare/6f35eede...44697f8b ------------- Changes: https://git.openjdk.java.net/jdk/pull/5873/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5873&range=09 Stats: 21982 lines in 104 files changed: 16217 ins; 2087 del; 3678 mod Patch: https://git.openjdk.java.net/jdk/pull/5873.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5873/head:pull/5873 PR: https://git.openjdk.java.net/jdk/pull/5873 From jvernee at openjdk.java.net Thu Nov 11 22:06:34 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Thu, 11 Nov 2021 22:06:34 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: References: Message-ID: <83KgAcV3vKaG38tiMA6R4qOZdOqIIBK-ZTIxWLh65lc=.98eea96e-53ef-4bad-b627-46b12f47d35c@github.com> On Thu, 11 Nov 2021 14:17:40 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1905: >> >>> 1903: } else { >>> 1904: // Compute a valid move order, using tmp_vmreg to break any cycles >>> 1905: ComputeMoveOrder cmo(total_in_args, in_regs, total_c_args, out_regs, in_sig_bt, arg_order, tmp_vmreg); >> >> `ComputeMoveOrder` is still used somewhere, or? > > Yes, it's used in > cpu/x86/universalUpcallHandler_x86_64.cpp: SharedRuntime::compute_move_order(in_sig_bt, FWIW, I have a change in panama-foreign repo that replaces that use with a custom class. Will remove ComputeMoveOrder there as well, and it should be completely gone after the next JEP integration, probably in 19 (the JEP for 18 doesn't include that change). ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From mdoerr at openjdk.java.net Thu Nov 11 22:20:33 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Nov 2021 22:20:33 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:19:01 GMT, Coleen Phillimore wrote: >> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. >> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Some platform adjustments. LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6343 From duke at openjdk.java.net Thu Nov 11 22:27:46 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 11 Nov 2021 22:27:46 GMT Subject: Integrated: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:26:03 GMT, Evgeny Astigeevich wrote: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds DIAGNOSTIC options `OnSpinWaitInst=inst`, where `inst` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `nop`: use `nop` instruction for spin pauses. > - `isb`: use `isb` instruction for spin pauses. > - `yield`: use `yield` instruction for spin pauses. > > And `OnSpinWaitInstCount=count`, where `count` specifies a number of `OnSpinWaitInst` and can be in `1..99` range. It is an error to use `OnSpinWaitInstCount` when `OnSpinWaitInst` is `none`. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the values of `OnSpinWaitInst` and `OnSpinWaitInstCount`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > CSR: https://bugs.openjdk.java.net/browse/JDK-8274564 This pull request has now been integrated. Changeset: 6954b98f Author: Evgeny Astigeevich Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/6954b98f8faf29b6c2d13687a7a94e83302bdd85 Stats: 766 lines in 13 files changed: 764 ins; 0 del; 2 mod 8186670: Implement _onSpinWait() intrinsic for AArch64 Reviewed-by: phh, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From dlong at openjdk.java.net Fri Nov 12 03:24:33 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 03:24:33 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information In-Reply-To: References: Message-ID: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com> On Thu, 11 Nov 2021 16:55:17 GMT, Vladimir Kozlov wrote: > Would be nice to print both versions numbers in error message. > Also I would like to be able ignore such error and process file anyway. Is report_error allows it? Currently report_error() saves the error string to be printed later, so to have an error message that requires formatting, I guess I would have to allocate the string using malloc or ResourceObj memory. Right now the only ignore flag is ReplayIgnoreInitErrors. I could introduce something like ReplayIgnoreAllErrors , or maybe turn this error into a warning. Christian is waiting on this version number support, so maybe I could create a separate RFE for the above suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From dlong at openjdk.java.net Fri Nov 12 03:36:59 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 03:36:59 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: References: Message-ID: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java Dean Long has updated the pull request incrementally with three additional commits since the last revision: - _current_mileage field is never used, stub out access - initialize _version to 0 - remove comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6344/files - new: https://git.openjdk.java.net/jdk/pull/6344/files/a7580022..20bea849 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=00-01 Stats: 7 lines in 2 files changed: 1 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6344.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344 PR: https://git.openjdk.java.net/jdk/pull/6344 From dholmes at openjdk.java.net Fri Nov 12 04:55:34 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 12 Nov 2021 04:55:34 GMT Subject: RFR: 8276658: Clean up JNI local handles code [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 13:58:06 GMT, Coleen Phillimore wrote: >> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. >> Move the fields to JavaThread and adding JavaThread* argument. >> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. >> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. >> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. >> The commits in this change have been performance tested individually and together with no meaningful differences from mainline. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add _is_running initialization. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From dholmes at openjdk.java.net Fri Nov 12 06:50:33 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 12 Nov 2021 06:50:33 GMT Subject: RFR: 8277012: Use blessed modifier order in src/utils In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 14:32:18 GMT, Magnus Ihse Bursie wrote: > I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound. > > There are no clear ownership of this code, but I believe it's kind of hotspot-related. Looks fine. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6354 From stuefe at openjdk.java.net Fri Nov 12 08:24:34 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 12 Nov 2021 08:24:34 GMT Subject: RFR: 8277012: Use blessed modifier order in src/utils In-Reply-To: References: Message-ID: <0dDzCW4HXazTrH_66L0Jrq9MIx2k86QqIgqBNPwh-lg=.f8928984-7a43-4b4a-8a23-245b79c4aa65@github.com> On Thu, 11 Nov 2021 14:32:18 GMT, Magnus Ihse Bursie wrote: > I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound. > > There are no clear ownership of this code, but I believe it's kind of hotspot-related. +1 ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6354 From chagedorn at openjdk.java.net Fri Nov 12 09:22:36 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 12 Nov 2021 09:22:36 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com> References: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com> Message-ID: On Fri, 12 Nov 2021 03:21:46 GMT, Dean Long wrote: >> src/hotspot/share/ci/ciReplay.cpp line 645: >> >>> 643: _version = parse_int("version"); >>> 644: if (_version > REPLAY_VERSION) { >>> 645: report_error("unrecognized version"); >> >> Would be nice to print both versions numbers in error message. >> Also I would like to be able ignore such error and process file anyway. Is `report_error` allows it? > >> Would be nice to print both versions numbers in error message. >> Also I would like to be able ignore such error and process file anyway. Is report_error allows it? > > Currently report_error() saves the error string to be printed later, so to have an error message that requires formatting, I guess I would have to allocate the string using malloc or ResourceObj memory. > Right now the only ignore flag is ReplayIgnoreInitErrors. I could introduce something like ReplayIgnoreAllErrors , or maybe turn this error into a warning. > Christian is waiting on this version number support, so maybe I could create a separate RFE for the above suggestions? It probably makes sense to turn this into a warning for now and file a follow up RFE as you have suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From chagedorn at openjdk.java.net Fri Nov 12 09:22:37 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 12 Nov 2021 09:22:37 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:59:39 GMT, Vladimir Kozlov wrote: >> Dean Long has updated the pull request incrementally with three additional commits since the last revision: >> >> - _current_mileage field is never used, stub out access >> - initialize _version to 0 >> - remove comment > > src/hotspot/share/ci/ciReplay.cpp line 837: > >> 835: rec->_state = parse_int("state"); >> 836: if (_version < 1) { >> 837: parse_int("current_mileage"); > > Why it is not assigned to `rec->_current_mileage` here? I guess we could leave this in for old replay files with the initialization further down in `ciReplay::initialize()` if `_version < 1`. What do you think @dean-long ? You should also update the method comment on L805: `` -> `` (or change it to `/` when leaving in the support for old replay files?). ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From chagedorn at openjdk.java.net Fri Nov 12 09:33:33 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 12 Nov 2021 09:33:33 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: References: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com> Message-ID: On Fri, 12 Nov 2021 08:55:29 GMT, Christian Hagedorn wrote: >>> Would be nice to print both versions numbers in error message. >>> Also I would like to be able ignore such error and process file anyway. Is report_error allows it? >> >> Currently report_error() saves the error string to be printed later, so to have an error message that requires formatting, I guess I would have to allocate the string using malloc or ResourceObj memory. >> Right now the only ignore flag is ReplayIgnoreInitErrors. I could introduce something like ReplayIgnoreAllErrors , or maybe turn this error into a warning. >> Christian is waiting on this version number support, so maybe I could create a separate RFE for the above suggestions? > > It probably makes sense to turn this into a warning for now and file a follow up RFE as you have suggested. However, thinking again about this, it should not happen that we parse a version number that's not supported. Maybe we should keep the error as it is indeed unexpected. It should also be easy to check the replay file manually in the error case to see which version number it had. Old replay files should still work as there is no "version X" line. Maybe you should also add `|| _version < 0` on L644. But the RFE still makes sense to improve the error reporting and to think about a new flag to ignore all errors. ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From mcimadamore at openjdk.java.net Fri Nov 12 11:16:17 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Fri, 12 Nov 2021 11:16:17 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v24] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Adopt blessed modofier order ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/8c3860f8..79d3d685 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=23 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=22-23 Stats: 7 lines in 6 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From coleenp at openjdk.java.net Fri Nov 12 13:10:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 13:10:44 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: <83KgAcV3vKaG38tiMA6R4qOZdOqIIBK-ZTIxWLh65lc=.98eea96e-53ef-4bad-b627-46b12f47d35c@github.com> References: <83KgAcV3vKaG38tiMA6R4qOZdOqIIBK-ZTIxWLh65lc=.98eea96e-53ef-4bad-b627-46b12f47d35c@github.com> Message-ID: <0gBVXgVFX1iEGdOhB_4TlrkqIAB5997vYcTMqf38YFg=.bc8f3ef5-f8f1-4afd-be13-1e0fb459a439@github.com> On Thu, 11 Nov 2021 22:02:35 GMT, Jorn Vernee wrote: >> Yes, it's used in >> cpu/x86/universalUpcallHandler_x86_64.cpp: SharedRuntime::compute_move_order(in_sig_bt, > > FWIW, I have a change in panama-foreign repo that replaces that use with a custom class. Will remove ComputeMoveOrder there as well, and it should be completely gone after the next JEP integration, probably in 19 (the JEP for 18 doesn't include that change). Ok, thanks Jorn. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From coleenp at openjdk.java.net Fri Nov 12 13:10:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 13:10:44 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 22:17:41 GMT, Martin Doerr wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Some platform adjustments. > > LGTM. Thanks! Thanks for reviewing @TheRealMDoerr . @shipilev does this look good now? ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From shade at openjdk.java.net Fri Nov 12 14:07:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 12 Nov 2021 14:07:37 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 16:19:01 GMT, Coleen Phillimore wrote: >> This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. >> Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Some platform adjustments. Yeah, I am fine with this. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6343 From ihse at openjdk.java.net Fri Nov 12 14:12:38 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Fri, 12 Nov 2021 14:12:38 GMT Subject: Integrated: 8277012: Use blessed modifier order in src/utils In-Reply-To: References: Message-ID: <4JP7H2okj8RsYPM_9UpEJKDJwjmryFGpfxf1_7vZICI=.020e8c9f-1035-4a5a-b0e8-d47c121833f9@github.com> On Thu, 11 Nov 2021 14:32:18 GMT, Magnus Ihse Bursie wrote: > I ran bin/blessed-modifier-order.sh on source code in src/utils. This scripts verifies that modifiers are in the "blessed" order, and fixes it otherwise. I have manually checked the changes made by the script to make sure they are sound. > > There are no clear ownership of this code, but I believe it's kind of hotspot-related. This pull request has now been integrated. Changeset: c4b44329 Author: Magnus Ihse Bursie URL: https://git.openjdk.java.net/jdk/commit/c4b44329c1d250f790ca82dd419cdf3330da16f5 Stats: 25 lines in 10 files changed: 0 ins; 0 del; 25 mod 8277012: Use blessed modifier order in src/utils Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/6354 From coleenp at openjdk.java.net Fri Nov 12 14:25:58 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 14:25:58 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v3] In-Reply-To: References: Message-ID: <9X29Q3z7nla7QEjfl1RqUQU6K3spIAuAtQFiCxNiCHM=.9c630211-5934-4b8f-ba39-987a2b41f845@github.com> > This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. > Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge master - Some platform adjustments. - 8258192: Obsolete the CriticalJNINatives flag ------------- Changes: https://git.openjdk.java.net/jdk/pull/6343/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6343&range=02 Stats: 1849 lines in 24 files changed: 0 ins; 1673 del; 176 mod Patch: https://git.openjdk.java.net/jdk/pull/6343.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6343/head:pull/6343 PR: https://git.openjdk.java.net/jdk/pull/6343 From pchilanomate at openjdk.java.net Fri Nov 12 15:15:35 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Fri, 12 Nov 2021 15:15:35 GMT Subject: RFR: 8276658: Clean up JNI local handles code [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 13:58:06 GMT, Coleen Phillimore wrote: >> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. >> Move the fields to JavaThread and adding JavaThread* argument. >> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. >> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. >> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. >> The commits in this change have been performance tested individually and together with no meaningful differences from mainline. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add _is_running initialization. Hi Coleen, Cleanup looks good to me. Thanks, Patricio src/hotspot/share/jfr/dcmd/jfrDcmds.cpp line 181: > 179: JNIHandleMark jni_handle_management(THREAD); > 180: > 181: DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(THREAD)); This method will call into Java below which already checks the thread is in vm so maybe this is not necessary. Even construct_dcmd_instance() has that assert. ------------- Marked as reviewed by pchilanomate (Committer). PR: https://git.openjdk.java.net/jdk/pull/6336 From duke at openjdk.java.net Fri Nov 12 16:18:04 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 12 Nov 2021 16:18:04 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v3] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: - Document pauth functions && remove OS split - Update UseROPProtection description ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/29471d30..25e62492 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=01-02 Stats: 369 lines in 9 files changed: 129 ins; 219 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Fri Nov 12 16:18:07 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 12 Nov 2021 16:18:07 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 08:48:07 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Simplify branch protection configure check *Updated UseROPProtection message *Moved pauth functions into single file *Added comments *Removed superfluous modifier arg from macroassembler funcs ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From coleenp at openjdk.java.net Fri Nov 12 16:22:05 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 16:22:05 GMT Subject: RFR: 8276658: Clean up JNI local handles code [v3] In-Reply-To: References: Message-ID: > JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. > Move the fields to JavaThread and adding JavaThread* argument. > Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. > Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. > The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. > The commits in this change have been performance tested individually and together with no meaningful differences from mainline. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove redundant assert. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6336/files - new: https://git.openjdk.java.net/jdk/pull/6336/files/f31dfeee..f24e32c1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6336&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6336.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6336/head:pull/6336 PR: https://git.openjdk.java.net/jdk/pull/6336 From coleenp at openjdk.java.net Fri Nov 12 16:22:10 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 16:22:10 GMT Subject: RFR: 8276658: Clean up JNI local handles code [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 13:58:06 GMT, Coleen Phillimore wrote: >> JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. >> Move the fields to JavaThread and adding JavaThread* argument. >> Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. >> Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. >> The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. >> The commits in this change have been performance tested individually and together with no meaningful differences from mainline. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add _is_running initialization. Thanks for the review, Patricio and David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From coleenp at openjdk.java.net Fri Nov 12 16:22:15 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 16:22:15 GMT Subject: RFR: 8276658: Clean up JNI local handles code [v2] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 15:08:24 GMT, Patricio Chilano Mateo wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add _is_running initialization. > > src/hotspot/share/jfr/dcmd/jfrDcmds.cpp line 181: > >> 179: JNIHandleMark jni_handle_management(THREAD); >> 180: >> 181: DEBUG_ONLY(JfrJavaSupport::check_java_thread_in_vm(THREAD)); > > This method will call into Java below which already checks the thread is in vm so maybe this is not necessary. Even construct_dcmd_instance() has that assert. You're right, it's doubly redundant. I'll remove it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From coleenp at openjdk.java.net Fri Nov 12 16:22:16 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 16:22:16 GMT Subject: Integrated: 8276658: Clean up JNI local handles code In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 17:16:29 GMT, Coleen Phillimore wrote: > JNI Local handles can only be created by JavaThread (there's an assert in make_local) but the fields are added to Thread. > Move the fields to JavaThread and adding JavaThread* argument. > Also, the global freelist isn't very useful now that global JNI handles don't use JNIHandleBlock, so the locking that claims incorrectly to block for safepoint is removed. > Lastly, there's at least 3 places that duplicate pushing a new JNIHandleBlock to the thread for temporarily adding JNI local handles. These have been moved to common code with a JNIHandleMark object, moved from jvmci code. > The commits are separate to help reviewing, but the entire change has been tested together with tier1-6. > The commits in this change have been performance tested individually and together with no meaningful differences from mainline. This pull request has now been integrated. Changeset: 3b2585c0 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/3b2585c02bd9d66cc2c8b2d5c16e9a48f4280d07 Stats: 425 lines in 25 files changed: 75 ins; 302 del; 48 mod 8276658: Clean up JNI local handles code Reviewed-by: dholmes, pchilanomate ------------- PR: https://git.openjdk.java.net/jdk/pull/6336 From kvn at openjdk.java.net Fri Nov 12 17:02:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 12 Nov 2021 17:02:44 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 03:36:59 GMT, Dean Long wrote: >> The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: >> 1. added a version number to the replay file >> 2. removed unnused ci fields >> 3. corrected comment in TestLambdas.java > > Dean Long has updated the pull request incrementally with three additional commits since the last revision: > > - _current_mileage field is never used, stub out access > - initialize _version to 0 > - remove comment okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6344 From kvn at openjdk.java.net Fri Nov 12 17:02:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 12 Nov 2021 17:02:44 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: References: <7Kz38EH-p2ElAwqeGSlbMnaYXH2kpzTTYldNhvA5buM=.495318a2-1505-442f-9ae9-6ae1ebcd11c4@github.com> Message-ID: On Fri, 12 Nov 2021 09:30:38 GMT, Christian Hagedorn wrote: >> It probably makes sense to turn this into a warning for now and file a follow up RFE as you have suggested. > > However, thinking again about this, it should not happen that we parse a version number that's not supported. Maybe we should keep the error as it is indeed unexpected. It should also be easy to check the replay file manually in the error case to see which version number it had. Old replay files should still work as there is no "version X" line. Maybe you should also add `|| _version < 0` on L644. > > But the RFE still makes sense to improve the error reporting and to think about a new flag to ignore all errors. Yes, file separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From coleenp at openjdk.java.net Fri Nov 12 17:06:46 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 17:06:46 GMT Subject: RFR: 8258192: Obsolete the CriticalJNINatives flag [v2] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 14:04:42 GMT, Aleksey Shipilev wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Some platform adjustments. > > Yeah, I am fine with this. Thanks @shipilev . All the GHA passed after resolving above merge conflict. ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From coleenp at openjdk.java.net Fri Nov 12 17:06:46 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Nov 2021 17:06:46 GMT Subject: Integrated: 8258192: Obsolete the CriticalJNINatives flag In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 22:06:05 GMT, Coleen Phillimore wrote: > This change removes the disabled CriticalJNINatives code, and the flag now gives an obsolete message. > Tested with tier1 on cpus x64, aarch64, and builds on linux-x86-open,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug. This pull request has now been integrated. Changeset: 0d2980cd Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/0d2980cdd1486b0689a71fc107a1d4c100bd3025 Stats: 1849 lines in 24 files changed: 0 ins; 1673 del; 176 mod 8258192: Obsolete the CriticalJNINatives flag Reviewed-by: mdoerr, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/6343 From aph at openjdk.java.net Fri Nov 12 17:39:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 12 Nov 2021 17:39:41 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v3] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 16:18:04 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: > > - Document pauth functions && remove OS split > - Update UseROPProtection description src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5254: > 5252: // Also use before signing to check that the pointer is valid and hasn't already been signed. > 5253: // > 5254: void MacroAssembler::check_return_address(Register return_reg) { This commentary is excellent. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From dlong at openjdk.java.net Fri Nov 12 20:23:07 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 20:23:07 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v3] In-Reply-To: References: Message-ID: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java Dean Long has updated the pull request incrementally with two additional commits since the last revision: - turn version error into a warning - updated syntax comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6344/files - new: https://git.openjdk.java.net/jdk/pull/6344/files/20bea849..46fd3fac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6344.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344 PR: https://git.openjdk.java.net/jdk/pull/6344 From dlong at openjdk.java.net Fri Nov 12 20:25:37 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 20:25:37 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v2] In-Reply-To: References: Message-ID: <9HXkfUAMDo56djXYPGHCOEKl3dT3x-DA7vjGeY1P6Xw=.94c4d6af-dc9b-434d-83fe-3c8324794040@github.com> On Fri, 12 Nov 2021 03:36:59 GMT, Dean Long wrote: >> The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: >> 1. added a version number to the replay file >> 2. removed unnused ci fields >> 3. corrected comment in TestLambdas.java > > Dean Long has updated the pull request incrementally with three additional commits since the last revision: > > - _current_mileage field is never used, stub out access > - initialize _version to 0 > - remove comment > I guess we could leave this in for old replay files with the initialization further down in ciReplay::initialize() if _version < 1. Yes, it's necessary to parse the value for old replay files, but the value is never used. I'm not sure what you are suggesting about the initialization further down. ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From dlong at openjdk.java.net Fri Nov 12 20:33:37 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 20:33:37 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v3] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 20:23:07 GMT, Dean Long wrote: >> The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: >> 1. added a version number to the replay file >> 2. removed unnused ci fields >> 3. corrected comment in TestLambdas.java > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - turn version error into a warning > - updated syntax comment > However, thinking again about this, it should not happen that we parse a version number that's not supported A user could be using an older JDK but accidentally try a newer replay file. That was the scenario I had in mind. ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From dlong at openjdk.java.net Fri Nov 12 20:40:07 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 12 Nov 2021 20:40:07 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v4] In-Reply-To: References: Message-ID: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java Dean Long has updated the pull request incrementally with one additional commit since the last revision: strengthen version check ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6344/files - new: https://git.openjdk.java.net/jdk/pull/6344/files/46fd3fac..0552e47a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6344&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6344.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6344/head:pull/6344 PR: https://git.openjdk.java.net/jdk/pull/6344 From simonis at openjdk.java.net Sat Nov 13 00:32:13 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Sat, 13 Nov 2021 00:32:13 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v9] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/b3c130c8..536f5398 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=07-08 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From leonid.mesnik at oracle.com Sat Nov 13 04:08:34 2021 From: leonid.mesnik at oracle.com (Leonid Mesnik) Date: Sat, 13 Nov 2021 04:08:34 +0000 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> Message-ID: Hi It is a hotpost testing problem rather than a jtreg problem. So I?ve added hotspot-dev at openjdk.java.net alias. Seems that problem is that WhiteBox API used in testing doesn?t correspond to JDK being tested. This commit changed WhiteBox.canWriteJavaHeapArchive() method https://github.com/openjdk/jdk/commit/922e86f4ff28c7b17af8e7b5867a40fc76b7fdd7#diff-e75d116b35afd951f114c2b0793b26d0009b441653d6b28d611afcbe0106dfd0 So you might see this linkage error if tries to test older version of JDK while tests have these changes. Could you please check that you use exactly the same sources during testing which have been used to build JDK. Leonid From: jtreg-use on behalf of Jaikiran Pai Date: Friday, November 12, 2021 at 8:40 PM To: jtreg-use at openjdk.java.net Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In order to reproduce one of the issues I have been looking into, I've been trying to run a jtreg test case against a Java 17 installation. The command I use is: java -jar jtreg.jar -jdk: test/jdk/java/..../SomeTest.java This runs into the following exception: failed to get value for vm.cds.write.archived.java.heap java.lang.UnsatisfiedLinkError: 'boolean jdk.test.whitebox.WhiteBox.canWriteJavaHeapArchive()' at jdk.test.whitebox.WhiteBox.canWriteJavaHeapArchive(Native Method) at requires.VMProps.vmCDSCanWriteArchivedJavaHeap(VMProps.java:413) at requires.VMProps$SafeMap.put(VMProps.java:72) at requires.VMProps.call(VMProps.java:113) at requires.VMProps.call(VMProps.java:60) at com.sun.javatest.regtest.agent.GetJDKProperties.run(GetJDKProperties.java:80) at com.sun.javatest.regtest.agent.GetJDKProperties.main(GetJDKProperties.java:54) Test results: failed: 1 Is this something I am doing wrong or is it some genuine issue? I haven't been able to run jtreg against a downloaded/installed JDK for many weeks now. Initially I thought I had somehow messed my local jdk source repo setup so didn't pay much attention to the failures. But now, I'm trying this on a completely different clean setup and that too runs into this issue. Here's the output of jtreg -version: jtreg 6.1-dev+1 Installed in \jtreg\lib\jtreg.jar Running on platform version 17.0.1 from \jdk-17.0.1. Built with Java(TM) 2 SDK, Version 1.8.0_312-b07 on November 12, 2021. Copyright (c) 1999, 2021, Oracle and/or its affiliates. All rights reserved. Use is subject to license terms. JT Harness, version 6.0 ea b14 (November 12, 2021) JCov 3.0-2 Java Assembler Tools, version 7.0 ea b09 (November 12, 2021) TestNG (testng.jar): version 7.3.0 TestNG (jcommander.jar): version unknown TestNG (guice.jar): version 4.2.3 JUnit (junit.jar): version 4.13.2 JUnit (hamcrest.jar): version 2.2 -Jaikiran From jai.forums2013 at gmail.com Sat Nov 13 05:37:46 2021 From: jai.forums2013 at gmail.com (Jaikiran Pai) Date: Sat, 13 Nov 2021 11:07:46 +0530 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> Message-ID: <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> I got past this with an extensive workaround for now. I moved/copied that test case java file outside of the JDK source tree, then created a new/custom TEST.ROOT which is very minimal and has no reference to whitebox for bootlibs, then made sure the jtwork directory is also outside of the JDK source tree (so that the test is compiled afresh) and then ran that test. That helped, but it's only for this test since its requirements in the test are very minimal. I don't see a way to get past this if I have to run the wider range of jtreg tests that reside in the JDK source tree against a pre-built/downloaded Java 17 or any previous versions. -Jaikiran On 13/11/21 10:26 am, Jaikiran Pai wrote: > Hello Leonid, > > On 13/11/21 9:38 am, Leonid Mesnik wrote: >> Hi >> >> It is a hotpost testing problem rather than a jtreg problem. So I?ve >> added >> hotspot-dev at openjdk.java.net alias. > Thank you for adding the right list. >> ... >> Could you please check that you use exactly the same sources during >> testing which have been used to build JDK. > > Do you mean the sources of the JDK against which the test is being > run? I don't have those sources since this test runs against a > pre-built binary downloaded from https://jdk.java.net/17/ > > -Jaikiran > From stuefe at openjdk.java.net Sat Nov 13 06:11:43 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Nov 2021 06:11:43 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 06:30:15 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. For the whole story please refer to https://bugs.openjdk.java.net/browse/JDK-8275301. >> >> This proposal adds NMT buffer overflow checking. As laid out in JDK-8275301: >> >> - it would give us C-heap overflow checking in release builds >> - the additional costs are neglectable >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. The error reports would also be confusing. >> - it is a preparation for future code removal (the memory guarding done in debug only in os::malloc() and friends, and possibly the guarding done with CheckJNICalls) >> >> Patch notes: >> >> 1) The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. >> >> On 64-bit, we don't even need to enlarge the malloc header: we carve some bits out by decreasing the size of the bucket index bit field to 16 bits. The bucket index field is used to store the bucket slot of the malloc site table in NMT detail mode. The malloc site table width is 512 atm, so 65k gives plenty of room for growing the malloc site table should we ever want to. >> >> On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes. That is because there were not enough bits to spare for a canary. On the upside, 8 bytes were not enough anyway, strictly speaking, to guarantee proper alignment e.g. for 128bit data types on all 32-bit platforms. See e.g. the malloc alignment the glibc uses. >> >> I also took the freedom of re-arranging the malloc header fields a bit to minimize the difference between 32-bit and 64-bit platforms, and to align each field optimally according to its size. I also switched from bitfields to real types in order to be able to do a sizeof() on them. >> >> For more details, see the comment in mallocTracker.hpp. >> >> 2) I added a footer canary trailing the user allocation to catch tail buffer overruns. For simplicity reasons (alignment) and to save some cycles I made it a byte only. That is enough to catch most overrun scenarios. If you think this is too small, I'm open to change it. >> >> 3) I put a bit of work into error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> 4) I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> (Note that these gtests, to test anything, need to run with NMT switched on. We do this as part of our NMT jtreg-controlled gtests in tier1). >> >> Even though the patch adds more code than it removes, it prepares possible code removal (if we can agree to do that) and the net result will be less complexity, not more. Again, see JDK-8275301 for details. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 14 days in a row without problems > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge > - Let NMT do overflow detection Friendly Ping. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From david.holmes at oracle.com Sat Nov 13 06:37:53 2021 From: david.holmes at oracle.com (David Holmes) Date: Sat, 13 Nov 2021 16:37:53 +1000 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> Message-ID: <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> On 13/11/2021 3:37 pm, Jaikiran Pai wrote: > I got past this with an extensive workaround for now. I moved/copied > that test case java file outside of the JDK source tree, then created a > new/custom TEST.ROOT which is very minimal and has no reference to > whitebox for bootlibs, then made sure the jtwork directory is also > outside of the JDK source tree (so that the test is compiled afresh) and > then ran that test. That helped, but it's only for this test since its > requirements in the test are very minimal. I don't see a way to get past > this if I have to run the wider range of jtreg tests that reside in the > JDK source tree against a pre-built/downloaded Java 17 or any previous > versions. Basically you're not supposed to do that. You have to test a given binary with the tests that existed when that binary was built. Many things in the tests can change that will fail to run with an older JDK. In theory you can use the build number of the binary JDK to checkout the tests corresponding to that build using the appropriate build tag. Cheers, David > -Jaikiran > > On 13/11/21 10:26 am, Jaikiran Pai wrote: >> Hello Leonid, >> >> On 13/11/21 9:38 am, Leonid Mesnik wrote: >>> Hi >>> >>> It is a hotpost testing problem rather than a jtreg problem. So I?ve >>> added >>> hotspot-dev at openjdk.java.net alias. >> Thank you for adding the right list. >>> ... >>> Could you please check that you use exactly the same sources during >>> testing which have been used to build JDK. >> >> Do you mean the sources of the JDK against which the test is being >> run? I don't have those sources since this test runs against a >> pre-built binary downloaded from https://jdk.java.net/17/ >> >> -Jaikiran >> From thomas.stuefe at gmail.com Sat Nov 13 07:56:40 2021 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 13 Nov 2021 08:56:40 +0100 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> Message-ID: Maybe the easiest way for you would be to get the source drop matching the binary JDK from the vendor of your JDK. Since you may also have vendor-specific changes (albeit rare, its possible). Cheers, Thomas On Sat, Nov 13, 2021 at 7:38 AM David Holmes wrote: > On 13/11/2021 3:37 pm, Jaikiran Pai wrote: > > I got past this with an extensive workaround for now. I moved/copied > > that test case java file outside of the JDK source tree, then created a > > new/custom TEST.ROOT which is very minimal and has no reference to > > whitebox for bootlibs, then made sure the jtwork directory is also > > outside of the JDK source tree (so that the test is compiled afresh) and > > then ran that test. That helped, but it's only for this test since its > > requirements in the test are very minimal. I don't see a way to get past > > this if I have to run the wider range of jtreg tests that reside in the > > JDK source tree against a pre-built/downloaded Java 17 or any previous > > versions. > > Basically you're not supposed to do that. You have to test a given > binary with the tests that existed when that binary was built. Many > things in the tests can change that will fail to run with an older JDK. > > In theory you can use the build number of the binary JDK to checkout the > tests corresponding to that build using the appropriate build tag. > > Cheers, > David > > > -Jaikiran > > > > On 13/11/21 10:26 am, Jaikiran Pai wrote: > >> Hello Leonid, > >> > >> On 13/11/21 9:38 am, Leonid Mesnik wrote: > >>> Hi > >>> > >>> It is a hotpost testing problem rather than a jtreg problem. So I?ve > >>> added > >>> hotspot-dev at openjdk.java.net > alias. > >> Thank you for adding the right list. > >>> ... > >>> Could you please check that you use exactly the same sources during > >>> testing which have been used to build JDK. > >> > >> Do you mean the sources of the JDK against which the test is being > >> run? I don't have those sources since this test runs against a > >> pre-built binary downloaded from https://jdk.java.net/17/ > >> > >> -Jaikiran > >> > From jai.forums2013 at gmail.com Sat Nov 13 08:28:14 2021 From: jai.forums2013 at gmail.com (Jaikiran Pai) Date: Sat, 13 Nov 2021 13:58:14 +0530 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> Message-ID: <1fb7457b-d933-836a-5f97-f078a320ab72@gmail.com> Hello David, On 13/11/21 12:07 pm, David Holmes wrote: > On 13/11/2021 3:37 pm, Jaikiran Pai wrote: >> I got past this with an extensive workaround for now. I moved/copied >> that test case java file outside of the JDK source tree, then created >> a new/custom TEST.ROOT which is very minimal and has no reference to >> whitebox for bootlibs, then made sure the jtwork directory is also >> outside of the JDK source tree (so that the test is compiled afresh) >> and then ran that test. That helped, but it's only for this test >> since its requirements in the test are very minimal. I don't see a >> way to get past this if I have to run the wider range of jtreg tests >> that reside in the JDK source tree against a pre-built/downloaded >> Java 17 or any previous versions. > > Basically you're not supposed to do that. I wasn't aware of that. I used to use this method to selectively run newly added jtreg tests against different downloaded versions of JDK and assumed it was a supported usecase. Thanks everyone for the inputs. -Jaikiran From joe.darcy at oracle.com Sat Nov 13 17:48:36 2021 From: joe.darcy at oracle.com (Joe Darcy) Date: Sat, 13 Nov 2021 09:48:36 -0800 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> Message-ID: <0f2c573f-ae6c-c52d-c3a9-e1a92baf6f2d@oracle.com> And the SCM hashes used to create a JDK build are one of the pieces of information in the $JDK/release file. -Joe On 11/12/2021 11:56 PM, Thomas St?fe wrote: > Maybe the easiest way for you would be to get the source drop matching the > binary JDK from the vendor of your JDK. Since you may also have > vendor-specific changes (albeit rare, its possible). > > Cheers, Thomas > > > On Sat, Nov 13, 2021 at 7:38 AM David Holmes > wrote: > >> On 13/11/2021 3:37 pm, Jaikiran Pai wrote: >>> I got past this with an extensive workaround for now. I moved/copied >>> that test case java file outside of the JDK source tree, then created a >>> new/custom TEST.ROOT which is very minimal and has no reference to >>> whitebox for bootlibs, then made sure the jtwork directory is also >>> outside of the JDK source tree (so that the test is compiled afresh) and >>> then ran that test. That helped, but it's only for this test since its >>> requirements in the test are very minimal. I don't see a way to get past >>> this if I have to run the wider range of jtreg tests that reside in the >>> JDK source tree against a pre-built/downloaded Java 17 or any previous >>> versions. >> Basically you're not supposed to do that. You have to test a given >> binary with the tests that existed when that binary was built. Many >> things in the tests can change that will fail to run with an older JDK. >> >> In theory you can use the build number of the binary JDK to checkout the >> tests corresponding to that build using the appropriate build tag. >> >> Cheers, >> David >> >>> -Jaikiran >>> >>> On 13/11/21 10:26 am, Jaikiran Pai wrote: >>>> Hello Leonid, >>>> >>>> On 13/11/21 9:38 am, Leonid Mesnik wrote: >>>>> Hi >>>>> >>>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve >>>>> added >>>>> hotspot-dev at openjdk.java.net >> alias. >>>> Thank you for adding the right list. >>>>> ... >>>>> Could you please check that you use exactly the same sources during >>>>> testing which have been used to build JDK. >>>> Do you mean the sources of the JDK against which the test is being >>>> run? I don't have those sources since this test runs against a >>>> pre-built binary downloaded from https://jdk.java.net/17/ >>>> >>>> -Jaikiran >>>> From duke at openjdk.java.net Mon Nov 15 09:07:11 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 15 Nov 2021 09:07:11 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge master - Document pauth functions && remove OS split - Update UseROPProtection description - Simplify branch protection configure check - 8264130: PAC-RET protection for Linux/AArch64 PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One of its uses is to protect against ROP based attacks. This is done by signing the Link Register whenever it is stored on the stack, and authenticating the value when it is loaded back from the stack. If an attacker were to try to change control flow by editing the stack then the authentication check of the Link Register will fail, causing a segfault when the function returns. On a system with PAC enabled, it is expected that all applications will be compiled with ROP protection. Fedora 33 and upwards already provide this. By compiling for ARMv8.0, GCC and LLVM will only use the set of PAC instructions that exist in the NOP space - on hardware without PAC, these instructions act as NOPs, allowing backward compatibility for negligible performance cost (2 NOPs per non-leaf function). Hardware is currently limited to the Apple M1 MacBooks. All testing has been done within a Fedora Docker image. A run of SpecJVM showed no difference to that of noise - which was surprising. The most important part of this patch is simply compiling using branch protection provided by GCC/LLVM. This protects all C++ code from being used in ROP attacks, removing all static ROP gadgets from use. The remainder of the patch adds ROP protection to runtime generated code, in both stubs and compiled Java code. Attacks here are much harder as ROP gadgets must be found dynamically at runtime. If/when AOT compilation is added to JDK, then all stubs and compiled Java will be susceptible ROP gadgets being found by static analysis and therefore potentially as vulnerable as C++ code. There are a number of places where the VM changes control flow by rewriting the stack or otherwise. I?ve done some analysis as to how these could also be used for attacks (which I didn?t want to post here). These areas can be protected ensuring the pointers to various stubs and entry points are stored in memory as signed pointers. These changes are simple to make (they can be reduced to a type change in common code and a few addition sign/auth calls in the backend), but there a lot of them and the total code change is fairly large. I?m happy to provide a few work in progress patches. In order to match the security benefits of the Apple Arm64e ABI across the whole of JDK, then all the changes mentioned above would be required. - Add PAC assembly instructions - Add AArch64 ROP protection runtime flag - Build with branch protection ------------- Changes: https://git.openjdk.java.net/jdk/pull/6334/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=03 Stats: 1436 lines in 25 files changed: 490 ins; 150 del; 796 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Nov 15 10:15:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 10:15:40 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: <9F2FQ7FTjc4Jzjf63x0pKeb2VPMsjcPQ-iQUo_rwCf4=.16d9e002-e6c9-4a4a-922c-ccbdf6e00eab@github.com> On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - Add AArch64 ROP protection runtime flag > - Build with branch protection src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452: > 450: > 451: // only r0 is valid at this time, all other registers have been destroyed by the runtime call > 452: __ invalidate_registers(false, true, true, true, true, true); Not so: `lr` is live. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Mon Nov 15 10:15:41 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 15 Nov 2021 10:15:41 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: <9psGxGDAGJTaAW2jtH3v3A6jsuq4x7aOMXMgJEyeLLI=.21f995de-9192-4483-a378-0a54e3d3745d@github.com> On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - Add AArch64 ROP protection runtime flag > - Build with branch protection src/hotspot/cpu/aarch64/pauth_aarch64.hpp line 33: > 31: > 32: // Support for ROP Protection in VM code. > 33: // This is provided by via the AArch64 PAC feature. "by via" should just be "via" ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Nov 15 10:18:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 10:18:42 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com> On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - Add AArch64 ROP protection runtime flag > - Build with branch protection src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452: > 450: // patch the return address, this stub will directly return to the exception handler > 451: __ str(r0, Address(rfp, 1*BytesPerWord)); > 452: Please explain the reason for this change, that leaves `lr` live across `restore_live_registers()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Nov 15 10:23:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 10:23:38 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - Add AArch64 ROP protection runtime flag > - Build with branch protection src/hotspot/cpu/aarch64/pauth_aarch64.hpp line 132: > 130: // Authenticate or strip a return value. Use for efficiency and only when the safety of the data > 131: // isn't an issue - for example when viewing the stack. > 132: // So, whether this function authenticates or strips the address depends only on debugging? The vague name makes the callers hard to read. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Mon Nov 15 10:31:52 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 15 Nov 2021 10:31:52 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: <464NS7NEldWHQM0Q8PF_MzPfl9O0CUj9GjbeI_qdjEc=.758f4244-8239-4e5d-bb08-a0dec85c2a06@github.com> On Mon, 15 Nov 2021 09:07:11 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - Add AArch64 ROP protection runtime flag > - Build with branch protection This is much clearer and looks good to push modulo a minor typo I noted in a comment. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Nov 15 10:42:38 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 15 Nov 2021 10:42:38 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 15:01:51 GMT, Alan Hayward wrote: >> src/hotspot/os_cpu/bsd_aarch64/pauth_bsd_aarch64.inline.hpp line 25: >> >>> 23: */ >>> 24: >>> 25: #ifndef OS_CPU_BSD_AARCH64_PAUTH_BSD_AARCH64_INLINE_HPP >> >> Are these two files different enough to separate them for BSD and Linux? > > My motivation was to avoid having any ifdefs - but we need one anyway for the apple ifdef. > > If I merged the two we would end up with just the contents of the BSD version of the file. > > There is also the windows version of the file, which for now has empty functions. If PAC in windows is added, that'll either use the same code or maybe Windows will provide an API (like the Apple one). Merging everything would mean windows gains the UseROPProtection check. >Are these two files different enough to separate them for BSD and Linux? Merging these files then broke everything for windows (because the asm function is different). Having a "ifdef apple, elseif windows else" doesn't really make sense, so I'll split the files out again. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Nov 15 11:01:36 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 15 Nov 2021 11:01:36 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 10:20:15 GMT, Andrew Haley wrote: >whether this function authenticates or strips the address depends only on debugging? Yes. We only need to strip the value, because we're not jumping to the lr value, only viewing it. The interface is different to a strip (as we need to pass in the modifier). How about something like pauth_authenticate_fast() ? or pauth_authenticate_unsafe() ? Alternatively, this function is only called by the functions in Frame, so the frequency of use is probably low enough (compared to the sign/auth every function) that it's not going to cause any performance issues. So, could just replace with calls to pauth_authenticate. I think that might be the best option. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Nov 15 11:11:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 11:11:37 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 10:58:06 GMT, Alan Hayward wrote: >> src/hotspot/cpu/aarch64/pauth_aarch64.hpp line 132: >> >>> 130: // Authenticate or strip a return value. Use for efficiency and only when the safety of the data >>> 131: // isn't an issue - for example when viewing the stack. >>> 132: // >> >> So, whether this function authenticates or strips the address depends only on debugging? The vague name makes the callers hard to read. > >>whether this function authenticates or strips the address depends only on debugging? > > Yes. We only need to strip the value, because we're not jumping to the lr value, only viewing it. > > The interface is different to a strip (as we need to pass in the modifier). > > How about something like pauth_authenticate_fast() ? or pauth_authenticate_unsafe() ? > > Alternatively, this function is only called by the functions in Frame, so the frequency of use is probably low enough (compared to the sign/auth every function) that it's not going to cause any performance issues. So, could just replace with calls to pauth_authenticate. I think that might be the best option. A simple rule here: function names go with what the release version does. So I'd go with the actual purpose, which is `pauth_strip_addr_for_debuginfo()`. That's right, isn't it? You only want this thing for stack traces, logs, etc. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Nov 15 11:24:46 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 15 Nov 2021 11:24:46 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com> References: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com> Message-ID: On Mon, 15 Nov 2021 10:15:41 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge master >> - Document pauth functions && remove OS split >> - Update UseROPProtection description >> - Simplify branch protection configure check >> - 8264130: PAC-RET protection for Linux/AArch64 >> >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. >> - Add PAC assembly instructions >> - Add AArch64 ROP protection runtime flag >> - Build with branch protection > > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452: > >> 450: // patch the return address, this stub will directly return to the exception handler >> 451: __ str(r0, Address(rfp, 1*BytesPerWord)); >> 452: > > Please explain the reason for this change, that leaves `lr` live across `restore_live_registers()`. In the original code: *save r0 to the lr location on the stack *restore_live_registers *Standard return: remove stack frame, load lr and fp off the stack, jump to lr. With PAC it would now be: *Sign r0 then save it to the lr location on the stack *restore_live_registers *Standard return: remove stack frame, load lr and fp off the stack, auth lr, jump to lr. After reading the code in restore_live_registers, it doesn't touch lr and so seemed odd to have the save to the stack, only to restore it directly afterwards. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Nov 15 11:33:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Nov 2021 11:33:41 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com> Message-ID: <1MtnvG48AfLFiyinjPMaT6KJ1MdM15mp2k2UMrryCgk=.7d91bba7-f1a7-4a26-8a4d-e1388a8b88ea@github.com> On Mon, 15 Nov 2021 11:21:37 GMT, Alan Hayward wrote: >> src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 452: >> >>> 450: // patch the return address, this stub will directly return to the exception handler >>> 451: __ str(r0, Address(rfp, 1*BytesPerWord)); >>> 452: >> >> Please explain the reason for this change, that leaves `lr` live across `restore_live_registers()`. > > In the original code: > *save r0 to the lr location on the stack > *restore_live_registers > *Standard return: remove stack frame, load lr and fp off the stack, jump to lr. > > With PAC it would now be: > *Sign r0 then save it to the lr location on the stack > *restore_live_registers > *Standard return: remove stack frame, load lr and fp off the stack, auth lr, jump to lr. > > After reading the code in restore_live_registers, it doesn't touch lr and so seemed odd to have the save to the stack, only to restore it directly afterwards. That's an optimization, though. You shouldn't need to read the code in `restore_live_registers()` to see if it's safe to keep the return address in LR: at best it's pathological coupling, in the sense that the correctness of this code depends on the internal details of `restore_live_registers()`. Let's keep LR live ranges as short as possible. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Nov 15 11:40:39 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 15 Nov 2021 11:40:39 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: <1MtnvG48AfLFiyinjPMaT6KJ1MdM15mp2k2UMrryCgk=.7d91bba7-f1a7-4a26-8a4d-e1388a8b88ea@github.com> References: <81l6r4GfgLq9L4qhlvi_VWKE46vPqhspX-d7NG6Qux0=.4dbf25ed-4c3f-415b-9ffc-ddaf69211cf2@github.com> <1MtnvG48AfLFiyinjPMaT6KJ1MdM15mp2k2UMrryCgk=.7d91bba7-f1a7-4a26-8a4d-e1388a8b88ea@github.com> Message-ID: On Mon, 15 Nov 2021 11:30:35 GMT, Andrew Haley wrote: >> In the original code: >> *save r0 to the lr location on the stack >> *restore_live_registers >> *Standard return: remove stack frame, load lr and fp off the stack, jump to lr. >> >> With PAC it would now be: >> *Sign r0 then save it to the lr location on the stack >> *restore_live_registers >> *Standard return: remove stack frame, load lr and fp off the stack, auth lr, jump to lr. >> >> After reading the code in restore_live_registers, it doesn't touch lr and so seemed odd to have the save to the stack, only to restore it directly afterwards. > > That's an optimization, though. You shouldn't need to read the code in `restore_live_registers()` to see if it's safe to keep the return address in LR: at best it's pathological coupling, in the sense that the correctness of this code depends on the internal details of `restore_live_registers()`. Let's keep LR live ranges as short as possible. Ok, that's fine, I'll update it (It'll simplify the total code diff too). ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Mon Nov 15 11:56:47 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 15 Nov 2021 11:56:47 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 11:08:57 GMT, Andrew Haley wrote: >>>whether this function authenticates or strips the address depends only on debugging? >> >> Yes. We only need to strip the value, because we're not jumping to the lr value, only viewing it. >> >> The interface is different to a strip (as we need to pass in the modifier). >> >> How about something like pauth_authenticate_fast() ? or pauth_authenticate_unsafe() ? >> >> Alternatively, this function is only called by the functions in Frame, so the frequency of use is probably low enough (compared to the sign/auth every function) that it's not going to cause any performance issues. So, could just replace with calls to pauth_authenticate. I think that might be the best option. > > A simple rule here: function names go with what the release version does. So I'd go with the actual purpose, which is `pauth_strip_addr_for_debuginfo()`. That's right, isn't it? You only want this thing for stack traces, logs, etc. This function is used by the frame code. So, that means it is used for all stack walks which are far from being simply cosmetic/ornamental. The runtime will rely on this for various different types of thread housekeeping. The difference here is that in product mode this simply strips auth bits whereas in debug mode it actually authenticates as it strips to give extra verification. So, your suggested name is quite misleading. Likewise Alan's suggested names is misleading because the primary product operation is to strip not authenticate. How about pauth_strip_verifiable? and a comment saying that it differs from pauth_strip by actually authenticating when debug is enabled. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From chagedorn at openjdk.java.net Mon Nov 15 12:59:36 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 15 Nov 2021 12:59:36 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v4] In-Reply-To: References: Message-ID: On Fri, 12 Nov 2021 20:40:07 GMT, Dean Long wrote: >> The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: >> 1. added a version number to the replay file >> 2. removed unnused ci fields >> 3. corrected comment in TestLambdas.java > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > strengthen version check > > I guess we could leave this in for old replay files with the initialization further down in ciReplay::initialize() if _version < 1. > > Yes, it's necessary to parse the value for old replay files, but the value is never used. I'm not sure what you are suggesting about the initialization further down. Ok, I was missing that the parsed value has no effect anyways. Then you do not need this code for the initialization on L1416 for `_version < 1`: m->_current_mileage = rec->_current_mileage; > > However, thinking again about this, it should not happen that we parse a version number that's not supported > > A user could be using an older JDK but accidentally try a newer replay file. That was the scenario I had in mind. That's a valid point for emitting a warning instead. Changes look good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6344 From jai.forums2013 at gmail.com Mon Nov 15 13:45:13 2021 From: jai.forums2013 at gmail.com (Jaikiran Pai) Date: Mon, 15 Nov 2021 19:15:13 +0530 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> Message-ID: <468cdc88-86de-8fde-a1b7-44837b9453a5@gmail.com> The way I used to use this previously was more for convenience than anything more. Very specifically, I used to do something like this: - Work on some bug fix with latest JDK master source repo. - Add a jtreg test to verify the fix - Send out a PR and wait for reviews - On some occasions, the review suggestions include relatively big changes to the jtreg test case. In such cases, I used to do those changes in the test, verify that the test still continues to pass. However, I would even want to make sure the test still reproduces the original issue. So instead of git reverting only the source code changes, building the current JDK again and then running the updated test, I would just point the jtreg run to a differently older version of a JDK (which wouldn't have the fix) by using the -jdk:. I would then expect the test to fail with the expected issue. It was just a convenience than anything more. -Jaikiran On 13/11/21 1:26 pm, Thomas St?fe wrote: > Maybe the easiest way for you would be to get the source drop matching the > binary JDK from the vendor of your JDK. Since you may also have > vendor-specific changes (albeit rare, its possible). > > Cheers, Thomas > > > On Sat, Nov 13, 2021 at 7:38 AM David Holmes > wrote: > >> On 13/11/2021 3:37 pm, Jaikiran Pai wrote: >>> I got past this with an extensive workaround for now. I moved/copied >>> that test case java file outside of the JDK source tree, then created a >>> new/custom TEST.ROOT which is very minimal and has no reference to >>> whitebox for bootlibs, then made sure the jtwork directory is also >>> outside of the JDK source tree (so that the test is compiled afresh) and >>> then ran that test. That helped, but it's only for this test since its >>> requirements in the test are very minimal. I don't see a way to get past >>> this if I have to run the wider range of jtreg tests that reside in the >>> JDK source tree against a pre-built/downloaded Java 17 or any previous >>> versions. >> Basically you're not supposed to do that. You have to test a given >> binary with the tests that existed when that binary was built. Many >> things in the tests can change that will fail to run with an older JDK. >> >> In theory you can use the build number of the binary JDK to checkout the >> tests corresponding to that build using the appropriate build tag. >> >> Cheers, >> David >> >>> -Jaikiran >>> >>> On 13/11/21 10:26 am, Jaikiran Pai wrote: >>>> Hello Leonid, >>>> >>>> On 13/11/21 9:38 am, Leonid Mesnik wrote: >>>>> Hi >>>>> >>>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve >>>>> added >>>>> hotspot-dev at openjdk.java.net >> alias. >>>> Thank you for adding the right list. >>>>> ... >>>>> Could you please check that you use exactly the same sources during >>>>> testing which have been used to build JDK. >>>> Do you mean the sources of the JDK against which the test is being >>>> run? I don't have those sources since this test runs against a >>>> pre-built binary downloaded from https://jdk.java.net/17/ >>>> >>>> -Jaikiran >>>> From duke at openjdk.java.net Mon Nov 15 13:59:40 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 15 Nov 2021 13:59:40 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v4] In-Reply-To: References: Message-ID: <8L4qvqk-eda9UU1BCA3i4yf3JVaJ0UMJxTDDLCO8XKg=.fc1bad40-8925-480d-a8a3-33f9c7650315@github.com> On Mon, 15 Nov 2021 11:54:09 GMT, Andrew Dinn wrote: > pauth_strip_verifiable That name works for me. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From hseigel at openjdk.java.net Mon Nov 15 14:58:49 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 15 Nov 2021 14:58:49 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags Message-ID: Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. Thanks, Harold ------------- Commit messages: - 8276795: Deprecate seldom used CDS flags Changes: https://git.openjdk.java.net/jdk/pull/6390/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6390&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276795 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6390/head:pull/6390 PR: https://git.openjdk.java.net/jdk/pull/6390 From eosterlund at openjdk.java.net Mon Nov 15 15:31:06 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 Nov 2021 15:31:06 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v3] In-Reply-To: References: Message-ID: > There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: > > 1. full_gc() > 2. final_allocation_attempt() > > And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. > > The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. > > The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. > > Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into 8259643_load_unload_bug - polish code alignment and rename register/unregister to add/remove - 8259643: ZGC can return metaspace OOM prematurely ------------- Changes: https://git.openjdk.java.net/jdk/pull/2289/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=02 Stats: 298 lines in 6 files changed: 276 ins; 17 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2289.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289 PR: https://git.openjdk.java.net/jdk/pull/2289 From eosterlund at openjdk.java.net Mon Nov 15 15:47:51 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 Nov 2021 15:47:51 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v3] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 15:31:06 GMT, Erik ?sterlund wrote: >> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: >> >> 1. full_gc() >> 2. final_allocation_attempt() >> >> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. >> >> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. >> >> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. >> >> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). > > Erik ?sterlund has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into 8259643_load_unload_bug > - polish code alignment and rename register/unregister to add/remove > - 8259643: ZGC can return metaspace OOM prematurely Sorry I ran out of steam with this patch a few months ago. Looks like I already had 3 reviews so I think I am ready to go. I rebased with the latest mainline, which involved just a small fix to what kind of lock (not safepoint checking lock with new rank) is used due to all the lock ranking changes as of lately. ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From eosterlund at openjdk.java.net Mon Nov 15 16:21:06 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 Nov 2021 16:21:06 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v4] In-Reply-To: References: Message-ID: <6fQXcaACVi0FL90kaNFWUHJ59Ki5eq_06AC2QCY1ZeU=.9dfc8d8f-2f02-4222-875e-67b226030f16@github.com> > There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: > > 1. full_gc() > 2. final_allocation_attempt() > > And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. > > The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. > > The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. > > Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: lock rank update ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2289/files - new: https://git.openjdk.java.net/jdk/pull/2289/files/1f45fa7f..012603f4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2289.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289 PR: https://git.openjdk.java.net/jdk/pull/2289 From eosterlund at openjdk.java.net Mon Nov 15 16:43:19 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 Nov 2021 16:43:19 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5] In-Reply-To: References: Message-ID: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> > There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: > > 1. full_gc() > 2. final_allocation_attempt() > > And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. > > The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. > > The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. > > Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: style polish in ZGC code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2289/files - new: https://git.openjdk.java.net/jdk/pull/2289/files/012603f4..9c6f1041 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2289.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289 PR: https://git.openjdk.java.net/jdk/pull/2289 From pliden at openjdk.java.net Mon Nov 15 16:43:22 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 15 Nov 2021 16:43:22 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5] In-Reply-To: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> References: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> Message-ID: On Mon, 15 Nov 2021 16:40:26 GMT, Erik ?sterlund wrote: >> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: >> >> 1. full_gc() >> 2. final_allocation_attempt() >> >> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. >> >> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. >> >> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. >> >> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > style polish in ZGC code Still looks good. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2289 From coleenp at openjdk.java.net Mon Nov 15 16:52:37 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 15 Nov 2021 16:52:37 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5] In-Reply-To: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> References: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> Message-ID: <_k1X1jd1-Em51dcazX20TXmjagqTVc7MnaUWIFtIwk4=.0e106ef7-010d-4031-ac8f-d3b5d783847d@github.com> On Mon, 15 Nov 2021 16:43:19 GMT, Erik ?sterlund wrote: >> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: >> >> 1. full_gc() >> 2. final_allocation_attempt() >> >> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. >> >> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. >> >> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. >> >> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > style polish in ZGC code src/hotspot/share/runtime/mutexLocker.cpp line 248: > 246: > 247: def(Metaspace_lock , PaddedMutex , nosafepoint-3); > 248: def(MetaspaceCritical_lock , PaddedMonitor, nosafepoint-1, true); You don't need the true parameter. That's the default for nosafepoint locks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From stuefe at openjdk.java.net Mon Nov 15 17:52:34 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 15 Nov 2021 17:52:34 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5] In-Reply-To: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> References: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> Message-ID: On Mon, 15 Nov 2021 16:43:19 GMT, Erik ?sterlund wrote: >> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: >> >> 1. full_gc() >> 2. final_allocation_attempt() >> >> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. >> >> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. >> >> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. >> >> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > style polish in ZGC code Nice to have you back. Change looks still good. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From dlong at openjdk.java.net Mon Nov 15 21:12:43 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 15 Nov 2021 21:12:43 GMT Subject: RFR: 8276095: ciReplay: replay failure due to incomplete ciMethodData information [v4] In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 12:56:35 GMT, Christian Hagedorn wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> strengthen version check > >> > I guess we could leave this in for old replay files with the initialization further down in ciReplay::initialize() if _version < 1. >> >> Yes, it's necessary to parse the value for old replay files, but the value is never used. I'm not sure what you are suggesting about the initialization further down. > > Ok, I was missing that the parsed value has no effect anyways. Then you do not need this code for the initialization on L1416 for `_version < 1`: > > m->_current_mileage = rec->_current_mileage; > > >> > However, thinking again about this, it should not happen that we parse a version number that's not supported >> >> A user could be using an older JDK but accidentally try a newer replay file. That was the scenario I had in mind. > > That's a valid point for emitting a warning instead. > > Changes look good to me! Thanks @chhagedorn and @vnkozlov. ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From dlong at openjdk.java.net Mon Nov 15 21:12:43 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 15 Nov 2021 21:12:43 GMT Subject: Integrated: 8276095: ciReplay: replay failure due to incomplete ciMethodData information In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 03:28:40 GMT, Dean Long wrote: > The replay data was missing MethodData::_invocation_counter. Adding it seems to fix the problem. @rwestrel please verify if it works for you. Also, with this change: > 1. added a version number to the replay file > 2. removed unnused ci fields > 3. corrected comment in TestLambdas.java This pull request has now been integrated. Changeset: 9326eb14 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/9326eb14617bf08e3376f854fc022e11d1ef34dd Stats: 63 lines in 8 files changed: 26 ins; 27 del; 10 mod 8276095: ciReplay: replay failure due to incomplete ciMethodData information Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/6344 From psandoz at openjdk.java.net Mon Nov 15 21:51:46 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 15 Nov 2021 21:51:46 GMT Subject: Integrated: 8271515: Integration of JEP 417: Vector API (Third Incubator) In-Reply-To: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> References: <_QQ9ntdJJfzVcAGrbjev0ZM-xNfD4wNATphnXkb-Y00=.bbf46985-8776-4dda-ada5-b15ab50774aa@github.com> Message-ID: On Fri, 8 Oct 2021 21:25:26 GMT, Paul Sandoz wrote: > This PR improves the performance of vector operations that accept masks on architectures that support masking in hardware, specifically Intel AVX512 and ARM SVE. > > On architectures that do not support masking in hardware the same technique as before is applied to most operations, specifically composition using blend. > > Masked loads/stores are a special form of masked operation that require additional care to ensure out-of-bounds access throw exceptions. The range checking has not been fully optimized and will require further work. > > No API enhancements were required and only a few additional tests were needed. This pull request has now been integrated. Changeset: a59c9b2a Author: Paul Sandoz URL: https://git.openjdk.java.net/jdk/commit/a59c9b2ac277d6ff6be1700d91ff389f137e61ca Stats: 21982 lines in 104 files changed: 16217 ins; 2087 del; 3678 mod 8271515: Integration of JEP 417: Vector API (Third Incubator) Co-authored-by: Sandhya Viswanathan Co-authored-by: Jatin Bhateja Co-authored-by: Ningsheng Jian Co-authored-by: Xiaohong Gong Co-authored-by: Eric Liu Co-authored-by: Jie Fu Co-authored-by: Vladimir Ivanov Co-authored-by: John R Rose Co-authored-by: Paul Sandoz Co-authored-by: Rado Smogura Reviewed-by: kvn, sviswanathan, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/5873 From david.holmes at oracle.com Mon Nov 15 21:59:28 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Nov 2021 07:59:28 +1000 Subject: jtreg cannot be run against a pre-built/downloaded JDK anymore? In-Reply-To: <468cdc88-86de-8fde-a1b7-44837b9453a5@gmail.com> References: <5405e633-00b4-949f-a982-d1057187d21a@gmail.com> <57f314df-33fd-1af1-8468-19d17b6d69ad@gmail.com> <680a2669-d5f8-a596-e3ad-071b9fb66233@gmail.com> <1102c99b-e4e7-5862-fd81-cb39af4c3a81@oracle.com> <468cdc88-86de-8fde-a1b7-44837b9453a5@gmail.com> Message-ID: <98cb4719-387d-fe60-e368-e9805917dba6@oracle.com> On 15/11/2021 11:45 pm, Jaikiran Pai wrote: > The way I used to use this previously was more for convenience than > anything more. Very specifically, I used to do something like this: > > - Work on some bug fix with latest JDK master source repo. > > - Add a jtreg test to verify the fix > > - Send out a PR and wait for reviews > > - On some occasions, the review suggestions include relatively big > changes to the jtreg test case. In such cases, I used to do those > changes in the test, verify that the test still continues to pass. > However, I would even want to make sure the test still reproduces the > original issue. So instead of git reverting only the source code > changes, building the current JDK again and then running the updated > test, I would just point the jtreg run to a differently older version of > a JDK (which wouldn't have the fix) by using the > -jdk:. I would then expect the test to fail with > the expected issue. > > It was just a convenience than anything more. Sure and most of the time that will work. But if the test relies on something that is only present in the later JDK binary then obviously the test will fail. Cheers, David > -Jaikiran > > On 13/11/21 1:26 pm, Thomas St?fe wrote: >> Maybe the easiest way for you would be to get the source drop matching >> the >> binary JDK from the vendor of your JDK. Since you may also have >> vendor-specific changes (albeit rare, its possible). >> >> Cheers, Thomas >> >> >> On Sat, Nov 13, 2021 at 7:38 AM David Holmes >> wrote: >> >>> On 13/11/2021 3:37 pm, Jaikiran Pai wrote: >>>> I got past this with an extensive workaround for now. I moved/copied >>>> that test case java file outside of the JDK source tree, then created a >>>> new/custom TEST.ROOT which is very minimal and has no reference to >>>> whitebox for bootlibs, then made sure the jtwork directory is also >>>> outside of the JDK source tree (so that the test is compiled afresh) >>>> and >>>> then ran that test. That helped, but it's only for this test since its >>>> requirements in the test are very minimal. I don't see a way to get >>>> past >>>> this if I have to run the wider range of jtreg tests that reside in the >>>> JDK source tree against a pre-built/downloaded Java 17 or any previous >>>> versions. >>> Basically you're not supposed to do that. You have to test a given >>> binary with the tests that existed when that binary was built. Many >>> things in the tests can change that will fail to run with an older JDK. >>> >>> In theory you can use the build number of the binary JDK to checkout the >>> tests corresponding to that build using the appropriate build tag. >>> >>> Cheers, >>> David >>> >>>> -Jaikiran >>>> >>>> On 13/11/21 10:26 am, Jaikiran Pai wrote: >>>>> Hello Leonid, >>>>> >>>>> On 13/11/21 9:38 am, Leonid Mesnik wrote: >>>>>> Hi >>>>>> >>>>>> It is a hotpost testing problem rather than a jtreg problem. So I?ve >>>>>> added >>>>>> hotspot-dev at openjdk.java.net >>> alias. >>>>> Thank you for adding the right list. >>>>>> ... >>>>>> Could you please check that you use exactly the same sources during >>>>>> testing which have been used to build JDK. >>>>> Do you mean the sources of the JDK against which the test is being >>>>> run? I don't have those sources since this test runs against a >>>>> pre-built binary downloaded from https://jdk.java.net/17/ >>>>> >>>>> -Jaikiran >>>>> From dholmes at openjdk.java.net Mon Nov 15 22:37:36 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 15 Nov 2021 22:37:36 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 14:50:43 GMT, Harold Seigel wrote: > Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. > > The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Hi Harold, You also need to add "(Deprecated)" to the description of these flags in globals.hpp. You should also add these flags to the VMDeprecatedOptions.java test. Thanks, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6390 From stuefe at openjdk.java.net Tue Nov 16 06:59:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 16 Nov 2021 06:59:54 GMT Subject: RFR: JDK-8277172: Remove stray comment mentioning instr_size_for_decode_klass_not_null on x64 Message-ID: Trivial cleanup. https://bugs.openjdk.java.net/browse/JDK-8241825 removed `instr_size_for_decode_klass_not_null()` on x64 but left a comment in place. Remove obsolete comment. ------------- Commit messages: - remove comment Changes: https://git.openjdk.java.net/jdk/pull/6384/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6384&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277172 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6384.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6384/head:pull/6384 PR: https://git.openjdk.java.net/jdk/pull/6384 From dholmes at openjdk.java.net Tue Nov 16 07:43:40 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 16 Nov 2021 07:43:40 GMT Subject: RFR: JDK-8277172: Remove stray comment mentioning instr_size_for_decode_klass_not_null on x64 In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:03:58 GMT, Thomas Stuefe wrote: > Trivial cleanup. > > https://bugs.openjdk.java.net/browse/JDK-8241825 removed `instr_size_for_decode_klass_not_null()` on x64 but left a comment in place. Remove obsolete comment. Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6384 From stuefe at openjdk.java.net Tue Nov 16 07:52:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 16 Nov 2021 07:52:41 GMT Subject: RFR: JDK-8277172: Remove stray comment mentioning instr_size_for_decode_klass_not_null on x64 In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 07:41:00 GMT, David Holmes wrote: > Looks good and trivial. > > Thanks, David Thanks David. ------------- PR: https://git.openjdk.java.net/jdk/pull/6384 From stuefe at openjdk.java.net Tue Nov 16 07:52:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 16 Nov 2021 07:52:42 GMT Subject: Integrated: JDK-8277172: Remove stray comment mentioning instr_size_for_decode_klass_not_null on x64 In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 09:03:58 GMT, Thomas Stuefe wrote: > Trivial cleanup. > > https://bugs.openjdk.java.net/browse/JDK-8241825 removed `instr_size_for_decode_klass_not_null()` on x64 but left a comment in place. Remove obsolete comment. This pull request has now been integrated. Changeset: 7719a74c Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/7719a74cec8c47fd036226b520a5fce7887386da Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8277172: Remove stray comment mentioning instr_size_for_decode_klass_not_null on x64 Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6384 From duke at openjdk.java.net Tue Nov 16 08:22:22 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 16 Nov 2021 08:22:22 GMT Subject: RFR: 8264130: PAC-RET protection for Linux/AArch64 [v5] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with three additional commits since the last revision: - Rename pauth_authenticate_or_strip_return_address - Fix windows aarch64 by restoring pauth file split - Don't keep LR live across restore_live_registers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/2c27eb5e..dbd6bda2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=03-04 Stats: 318 lines in 6 files changed: 233 ins; 70 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From eosterlund at openjdk.java.net Tue Nov 16 08:38:05 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 16 Nov 2021 08:38:05 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v6] In-Reply-To: References: Message-ID: <9kt7RPhsAqWys4i2qaLwebf28SQdYzU7uhdefuQtVvQ=.9bbb723e-e2ec-4c57-ae58-64e57f46aa9f@github.com> > There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: > > 1. full_gc() > 2. final_allocation_attempt() > > And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. > > The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. > > The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. > > Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: return bool for Coleen ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2289/files - new: https://git.openjdk.java.net/jdk/pull/2289/files/9c6f1041..df0cdc87 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2289&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2289.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2289/head:pull/2289 PR: https://git.openjdk.java.net/jdk/pull/2289 From eosterlund at openjdk.java.net Tue Nov 16 08:38:07 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 16 Nov 2021 08:38:07 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5] In-Reply-To: References: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> Message-ID: On Mon, 15 Nov 2021 16:39:22 GMT, Per Liden wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> style polish in ZGC code > > Still looks good. Thanks for the reviews @pliden @tstuefe and @coleenp! ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From eosterlund at openjdk.java.net Tue Nov 16 08:38:13 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 16 Nov 2021 08:38:13 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v5] In-Reply-To: <_k1X1jd1-Em51dcazX20TXmjagqTVc7MnaUWIFtIwk4=.0e106ef7-010d-4031-ac8f-d3b5d783847d@github.com> References: <2xi37tsSVLQx5aVD3A5spaHygDDEOUKQTmh4cScBDjg=.7d053595-5a0f-43d8-b518-f0264c8e17e3@github.com> <_k1X1jd1-Em51dcazX20TXmjagqTVc7MnaUWIFtIwk4=.0e106ef7-010d-4031-ac8f-d3b5d783847d@github.com> Message-ID: On Mon, 15 Nov 2021 16:49:31 GMT, Coleen Phillimore wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> style polish in ZGC code > > src/hotspot/share/runtime/mutexLocker.cpp line 248: > >> 246: >> 247: def(Metaspace_lock , PaddedMutex , nosafepoint-3); >> 248: def(MetaspaceCritical_lock , PaddedMonitor, nosafepoint-1, true); > > You don't need the true parameter. That's the default for nosafepoint locks. Oh, okay. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From eosterlund at openjdk.java.net Tue Nov 16 08:51:47 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 16 Nov 2021 08:51:47 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler Message-ID: When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. ------------- Commit messages: - 8266368: Inaccurate after_unwind hook in C2 exception handler Changes: https://git.openjdk.java.net/jdk/pull/6405/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6405&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266368 Stats: 12 lines in 2 files changed: 5 ins; 5 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6405.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6405/head:pull/6405 PR: https://git.openjdk.java.net/jdk/pull/6405 From xxinliu at amazon.com Tue Nov 16 09:44:13 2021 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 16 Nov 2021 01:44:13 -0800 Subject: Is it necessary to check subtype for invokeinterface of private method? Message-ID: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com> Hi, I am working on the regex performance in JDK-8274983. Even though it begins with regex, the problem boils down to invokeinterface to the private interface methods. Before hidden class (JDK-8238358), lambda meta factory generates invokespecial for them. Now it generates invokeinterface instead. C1 doesn't recognize the new code pattern and generates an ic virtual call for the callsite. If many classes all implement a common interface, they trash the ic stub because the concrete classes are different. InvokePrivateInterfaceMethod.java with -XX:+TraceCallFixup can reveal this pathological slowness. Is it the intentional behavior of C1? I see that C2 actually generates checkcast code sequence for this case. I would like to patch up C1 because C1 plays an important role for the startup time. I have a patch to let C1 treats invokeinterface private interface methods as invokespecial. In other words, I treat the private interface methods as effective final. It runs pretty well until I encounter the regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That leads me to the second question. In my understanding, the code unsafeCastI2() is essentially a typecast of a function pointer, isn't it? My take of "unsafe" in "unsafeCastI2" is that its behavior undefined, then why we need to check ICCE here? System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1"); shouldThrowICCE(() -> PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1()))) static I2 unsafeCastI2(Object obj) { try { MethodHandle mh = MethodHandles.identity(Object.class); mh = MethodHandles.explicitCastArguments(mh, mh.type().changeReturnType(I2.class)); return (I2)mh.invokeExact((Object) obj); } catch (Throwable e) { throw new Error(e); } } In real world, how much meaningful we detect the error if we accidentally invoke a private interface method where we actually don't implement that interface. I think it's only possible via methodhandle and jasm, right? if we say it's undefined behavior, I think we can skip typecheck. It would make lambda code faster. thanks, --lx From coleenp at openjdk.java.net Tue Nov 16 13:36:49 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 16 Nov 2021 13:36:49 GMT Subject: RFR: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" Message-ID: The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread. Moved the bit to AccessFlags which has space and an atomic set operation. Tested with tier1-6, 7-8 in progress. ------------- Commit messages: - 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" Changes: https://git.openjdk.java.net/jdk/pull/6410/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6410&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276177 Stats: 13 lines in 3 files changed: 7 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6410.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6410/head:pull/6410 PR: https://git.openjdk.java.net/jdk/pull/6410 From duke at openjdk.java.net Tue Nov 16 14:23:07 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 16 Nov 2021 14:23:07 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v6] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge master - Rename pauth_authenticate_or_strip_return_address - Fix windows aarch64 by restoring pauth file split - Don't keep LR live across restore_live_registers - Merge master - Document pauth functions && remove OS split - Update UseROPProtection description - Simplify branch protection configure check - 8264130: PAC-RET protection for Linux/AArch64 PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One of its uses is to protect against ROP based attacks. This is done by signing the Link Register whenever it is stored on the stack, and authenticating the value when it is loaded back from the stack. If an attacker were to try to change control flow by editing the stack then the authentication check of the Link Register will fail, causing a segfault when the function returns. On a system with PAC enabled, it is expected that all applications will be compiled with ROP protection. Fedora 33 and upwards already provide this. By compiling for ARMv8.0, GCC and LLVM will only use the set of PAC instructions that exist in the NOP space - on hardware without PAC, these instructions act as NOPs, allowing backward compatibility for negligible performance cost (2 NOPs per non-leaf function). Hardware is currently limited to the Apple M1 MacBooks. All testing has been done within a Fedora Docker image. A run of SpecJVM showed no difference to that of noise - which was surprising. The most important part of this patch is simply compiling using branch protection provided by GCC/LLVM. This protects all C++ code from being used in ROP attacks, removing all static ROP gadgets from use. The remainder of the patch adds ROP protection to runtime generated code, in both stubs and compiled Java code. Attacks here are much harder as ROP gadgets must be found dynamically at runtime. If/when AOT compilation is added to JDK, then all stubs and compiled Java will be susceptible ROP gadgets being found by static analysis and therefore potentially as vulnerable as C++ code. There are a number of places where the VM changes control flow by rewriting the stack or otherwise. I?ve done some analysis as to how these could also be used for attacks (which I didn?t want to post here). These areas can be protected ensuring the pointers to various stubs and entry points are stored in memory as signed pointers. These changes are simple to make (they can be reduced to a type change in common code and a few addition sign/auth calls in the backend), but there a lot of them and the total code change is fairly large. I?m happy to provide a few work in progress patches. In order to match the security benefits of the Apple Arm64e ABI across the whole of JDK, then all the changes mentioned above would be required. - Add PAC assembly instructions - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56 ------------- Changes: https://git.openjdk.java.net/jdk/pull/6334/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=05 Stats: 1347 lines in 25 files changed: 521 ins; 18 del; 808 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From hseigel at openjdk.java.net Tue Nov 16 15:56:00 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 16 Nov 2021 15:56:00 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2] In-Reply-To: References: Message-ID: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> > Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. > > The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: Add (Deprecated) to comments and add options to deprecated test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6390/files - new: https://git.openjdk.java.net/jdk/pull/6390/files/aad3f00b..9d49730e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6390&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6390&range=00-01 Stats: 11 lines in 2 files changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6390/head:pull/6390 PR: https://git.openjdk.java.net/jdk/pull/6390 From hseigel at openjdk.java.net Tue Nov 16 15:56:01 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 16 Nov 2021 15:56:01 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 14:50:43 GMT, Harold Seigel wrote: > Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. > > The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold David, thanks for looking at this change. Please review the updated commit. It contains the needed additional changes that you pointed out. Thanks, Harold ------------- PR: https://git.openjdk.java.net/jdk/pull/6390 From ccheung at openjdk.java.net Tue Nov 16 17:36:36 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Tue, 16 Nov 2021 17:36:36 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2] In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> References: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> Message-ID: On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel wrote: >> Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. >> >> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > Add (Deprecated) to comments and add options to deprecated test LGTM. ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6390 From iklam at openjdk.java.net Tue Nov 16 17:40:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 16 Nov 2021 17:40:39 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2] In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> References: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> Message-ID: On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel wrote: >> Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. >> >> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > Add (Deprecated) to comments and add options to deprecated test LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6390 From jorn.vernee at oracle.com Tue Nov 16 17:51:02 2021 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Tue, 16 Nov 2021 18:51:02 +0100 Subject: Questions about oop handling for Panama upcalls. Message-ID: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> Hi, For panama-foreign upcalls we spin our own upcall stubs that wrap a method handle VM entry for the actual upcall. I want to make sure I have the oop handling correct on this. We receive a list of arguments from native code (all primitives, so no oops to handle there), and then prefix that list with a MethodHandle oop, before calling into the MH's VM entry. The MH oop can be stored in three different places: 1. The MH oop is stored in a global JNI handle, and then resolved right before the upcall [1]. 2. The MH oop is then stored in the first argument register j_rarg0 for the call. 3. During a deopt of the callee, the deoptimization code spills the receiver (MH oop) into the frame of the upcall stub. (looks like the extending of the frame that happens for instance in c2i adapters doesn't make room for the receiver?). I don't think I need to do anything else for 1., but for 2. and 3. there is currently no handling. I wanted to ask how those cases should be handled, if at all. I think 2. could in theory be addressed by implementing CodeBlob::preserve_callee_argument_oops. Though, it has been working fine so far without this, so I'm wondering if this is even needed. Is the caller or callee responsible for handling argument oops (seems to be caller, from looking at CompiledMethod::preserve_callee_argument_oops)? Or does the caller just handle the receiver if there is one (since deopt spills that into the callers frame)? The oop offset is passed to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as an oop* [2]. Does the argument register get spilled somewhere and the oop needs to be patched in place at that address (by the OopClosure)? Or is this just used to mark the oop as alive? (in the latter case, the JNI global should be enough I think). I think 3. could be handled with an OopMap entry at the frame offset where the receiver is spilled during a deopt of the callee? Should it be an oop or a narrowOop, or does it depend on VM settings? FWIW, the deopt code always seems to need a machine word (64-bits) to do the spilling, so I think it's an oop? Do I need to zero out that part of the frame when allocating the frame so that the GC doesn't mistake some garbage that's in there for an oop? I have a POC patch here for reference [3], that implements the 2 things above. This passes our test suite, but I'm not sure about the correctness. Looking at what JNI does for upcalls [4], I don't see how e.g. the receiver argument that is put on the stack is handled, or what happens when the callee deopts (though I think it would just overwrite the value on the stack that's there already, since JNI always seems to do interpreted calls, where we do compiled calls).? But, JNI/the call stub might be special cased elsewhere... Also, the oop is briefly stored in rscratch1 when resolving. I'm interested to know when the GC can look at the frame and register state, especially with concurrent GCs in mind. I'm assuming it's only during the call to the MH VM entry (but the existence of frame::safe_for_sender makes me less sure)? AFAIK the call counts as a safepoint (with oop map for it typically stored at the return offset). At this safepoint, the oop can only be stored at one of the 3 places listed at the start. Thanks, Jorn [1] : https://github.com/openjdk/panama-foreign/blob/foreign-jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L416 [2] : https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/frame.cpp#L939-L946 [3] : https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:Deopt_Crash [4] : https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L339 From duke at openjdk.java.net Tue Nov 16 18:21:50 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 16 Nov 2021 18:21:50 GMT Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 Message-ID: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. Testing: - `make test TEST=gtest`: Passed - `make run-test TEST=tier1`: Passed - `make run-test TEST=tier2`: Passed - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed ------------- Commit messages: - 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 Changes: https://git.openjdk.java.net/jdk/pull/6415/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277137 Stats: 100 lines in 2 files changed: 100 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6415.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6415/head:pull/6415 PR: https://git.openjdk.java.net/jdk/pull/6415 From phh at openjdk.java.net Tue Nov 16 19:00:40 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 16 Nov 2021 19:00:40 GMT Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Tue, 16 Nov 2021 18:14:15 GMT, Evgeny Astigeevich wrote: > One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. > > Testing: > - `make test TEST=gtest`: Passed > - `make run-test TEST=tier1`: Passed > - `make run-test TEST=tier2`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed I'd explicitly set OnSpinWaitInstCount because one has to go find the default value in another file to understand what's going to happen. So I'd add: FLAG_SET_DEFAULT(OnSpinWaitInstCount, 1); ------------- Changes requested by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6415 From duke at openjdk.java.net Tue Nov 16 19:15:11 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 16 Nov 2021 19:15:11 GMT Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 [v2] In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: > One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. > > Testing: > - `make test TEST=gtest`: Passed > - `make run-test TEST=tier1`: Passed > - `make run-test TEST=tier2`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Explicitly set OnSpinWaitInstCount to 1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6415/files - new: https://git.openjdk.java.net/jdk/pull/6415/files/b3b8a23e..56258906 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6415.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6415/head:pull/6415 PR: https://git.openjdk.java.net/jdk/pull/6415 From duke at openjdk.java.net Tue Nov 16 19:15:14 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 16 Nov 2021 19:15:14 GMT Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 [v2] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Tue, 16 Nov 2021 18:57:53 GMT, Paul Hohensee wrote: > I'd explicitly set OnSpinWaitInstCount because one has to go find the default value in another file to understand what's going to happen. So I'd add: > > FLAG_SET_DEFAULT(OnSpinWaitInstCount, 1); Thank you for reviewing. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From duke at openjdk.java.net Wed Nov 17 03:53:58 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 17 Nov 2021 03:53:58 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates [v5] In-Reply-To: References: Message-ID: > for(int i = 0; i < LENGTH; i++) { > c[i] = a[i] + 2; > } > > For the case showed above, after superword optimization with SVE, > without the patch, the vector add operation always has 2 z-reg inputs, > like: > mov z16.s, #2 > add z17.s, z17.s, z16.s > > Considering sve has supported basic binary operations with immediate, > this pattern could be further optimized to: > add z16.s, z16.s, #2 > > To implement it, we added some new match rules and assembler rules in > the aarch64 backend. We also made some extensions on immediate types > and functions to keep backward compatible. > > With the patch, only these binary integer vector operations, +(add), > -(sub), &(and), |(orr), and ^(eor) with immediate are supported for > the optimization. Other vector operations are not supported currently. > > Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 > CPU, no new failure. > > There is no obvious performance uplift but it can help remove one > redundant mov instruction. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Regenerate the asmtest.out.h file for aarch64 after rebasing Change-Id: I1292449268c73c8f84cc3ffa7a4c859cf79058eb - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026 Change-Id: I2004dc45f7f0ab44bc22b48083b185e7b3bd5eea - Add some assertion lines for help functions Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c - Split the original patch and leave the existing logic in Assembler entirely untouched Change-Id: If8ddcef07b15615d7dd0c3063c44d2b705fac6f7 - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026 Change-Id: I52aa66d200b74ac312c5d40283b94854bc1142e6 - 8274179: AArch64: Support SVE operations with encodable immediates for(int i = 0; i < LENGTH; i++) { c[i] = a[i] + 2; } For the case showed above, after superword optimization with SVE, without the patch, the vector add operation always has 2 z-reg inputs, like: mov z16.s, #2 add z17.s, z17.s, z16.s Considering sve has supported basic binary operations with immediate, this pattern could be further optimized to: add z16.s, z16.s, #2 To implement it, we added some new match rules and assembler rules in the aarch64 backend. We also made some extensions on immediate types and functions to keep backward compatible. With the patch, only these binary integer vector operations, +(add), -(sub), &(and), |(orr), and ^(eor) with immediate are supported for the optimization. Other vector operations are not supported currently. Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 CPU, no new failure. There is no obvious performance uplift but it can help remove one redundant mov instruction. Change-Id: Iaec40e362918118691083fb171cc4dff390b35a2 ------------- Changes: https://git.openjdk.java.net/jdk/pull/6115/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6115&range=04 Stats: 1476 lines in 12 files changed: 1329 ins; 43 del; 104 mod Patch: https://git.openjdk.java.net/jdk/pull/6115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6115/head:pull/6115 PR: https://git.openjdk.java.net/jdk/pull/6115 From david.holmes at oracle.com Wed Nov 17 06:56:23 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 17 Nov 2021 16:56:23 +1000 Subject: Is it necessary to check subtype for invokeinterface of private method? In-Reply-To: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com> References: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com> Message-ID: Hi Xin, On 16/11/2021 7:44 pm, Liu, Xin wrote: > Hi, > > I am working on the regex performance in JDK-8274983. Even though it > begins with regex, the problem boils down to invokeinterface to the > private interface methods. Before hidden class (JDK-8238358), lambda > meta factory generates invokespecial for them. Now it generates > invokeinterface instead. C1 doesn't recognize the new code pattern and > generates an ic virtual call for the callsite. If many classes all > implement a common interface, they trash the ic stub because the > concrete classes are different. InvokePrivateInterfaceMethod.java with > -XX:+TraceCallFixup can reveal this pathological slowness. > > Is it the intentional behavior of C1? I see that C2 actually generates > checkcast code sequence for this case. I would like to patch up C1 > because C1 plays an important role for the startup time. Can't comment on C1 issue. > I have a patch to let C1 treats invokeinterface private interface > methods as invokespecial. In other words, I treat the private interface > methods as effective final. It runs pretty well until I encounter the > regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That > leads me to the second question. > > In my understanding, the code unsafeCastI2() is essentially a typecast > of a function pointer, isn't it? My take of "unsafe" in "unsafeCastI2" > is that its behavior undefined, then why we need to check ICCE here? "unsafe" means that the validity of the cast can't be checked immediately, but it will be checked when the result is actually used - it is not "undefined behaviour". The VM and MethodHandle specifications require the subtype check on the receiver: JVMS 6.5 invoke_interface - Run-time Exception: Otherwise, if the class of objectref does not implement the resolved interface, invokeinterface throws an IncompatibleClassChangeError. > System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1"); > shouldThrowICCE(() -> > PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1()))) > > static I2 unsafeCastI2(Object obj) { > try { > MethodHandle mh = MethodHandles.identity(Object.class); > mh = MethodHandles.explicitCastArguments(mh, > mh.type().changeReturnType(I2.class)); > return (I2)mh.invokeExact((Object) obj); > } catch (Throwable e) { > throw new Error(e); > } > } > > In real world, how much meaningful we detect the error if we > accidentally invoke a private interface method where we actually don't > implement that interface. I think it's only possible via methodhandle > and jasm, right? if we say it's undefined behavior, I think we can skip > typecheck. It would make lambda code faster. I think there are potential security considerations here, but if you want to make a case for change then email: jls-jvms-spec-comments at openjdk.java.net Cheers, David > thanks, > --lx > From ngasson at openjdk.java.net Wed Nov 17 07:42:36 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 17 Nov 2021 07:42:36 GMT Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 [v2] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Tue, 16 Nov 2021 19:15:11 GMT, Evgeny Astigeevich wrote: >> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. >> >> Testing: >> - `make test TEST=gtest`: Passed >> - `make run-test TEST=tier1`: Passed >> - `make run-test TEST=tier2`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly set OnSpinWaitInstCount to 1 src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 206: > 204: } > 205: > 206: if (FLAG_IS_DEFAULT(OnSpinWaitInst) && FLAG_IS_DEFAULT(OnSpinWaitInstCount)) { Should these two be set independently? If I pass `-XX:OnSpinWaitInstCount=2` then `OnSpinWaitInst` will default to "none". ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From xxinliu at amazon.com Wed Nov 17 08:22:10 2021 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 17 Nov 2021 00:22:10 -0800 Subject: Is it necessary to check subtype for invokeinterface of private method? In-Reply-To: References: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com> Message-ID: <687da655-903b-b6b7-2b38-23e3cf722106@amazon.com> Hi, David, Thanks you the head-up! Now I understand why c2 generates the checkcast code for invokespecial and invokeinterface. It must conform to the JVM spec. C1 does the checkcast for invokespecial right now. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1874 ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java is correct. The ICCE is expected. I will implement invokeinterface by book. I don't intend the challenge the JVM spec. The private interface methods is new for me. I double think about this. There's no polymorphism for them, so c1/c2 can use relocInfo::opt_virtual_call_type to optimize the callsites of them, but the typecheck still needs to be in place! thanks, --lx On 11/16/21 10:56 PM, David Holmes wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Xin, > > On 16/11/2021 7:44 pm, Liu, Xin wrote: >> Hi, >> >> I am working on the regex performance in JDK-8274983. Even though it >> begins with regex, the problem boils down to invokeinterface to the >> private interface methods. Before hidden class (JDK-8238358), lambda >> meta factory generates invokespecial for them. Now it generates >> invokeinterface instead. C1 doesn't recognize the new code pattern and >> generates an ic virtual call for the callsite. If many classes all >> implement a common interface, they trash the ic stub because the >> concrete classes are different. InvokePrivateInterfaceMethod.java with >> -XX:+TraceCallFixup can reveal this pathological slowness. >> >> Is it the intentional behavior of C1? I see that C2 actually generates >> checkcast code sequence for this case. I would like to patch up C1 >> because C1 plays an important role for the startup time. > > Can't comment on C1 issue. > >> I have a patch to let C1 treats invokeinterface private interface >> methods as invokespecial. In other words, I treat the private interface >> methods as effective final. It runs pretty well until I encounter the >> regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That >> leads me to the second question. >> >> In my understanding, the code unsafeCastI2() is essentially a typecast >> of a function pointer, isn't it? My take of "unsafe" in "unsafeCastI2" >> is that its behavior undefined, then why we need to check ICCE here? > > "unsafe" means that the validity of the cast can't be checked > immediately, but it will be checked when the result is actually used - > it is not "undefined behaviour". The VM and MethodHandle specifications > require the subtype check on the receiver: > > JVMS 6.5 invoke_interface - Run-time Exception: Otherwise, if the class > of objectref does not implement the resolved interface, invokeinterface > throws an IncompatibleClassChangeError. > >> System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1"); >> shouldThrowICCE(() -> >> PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1()))) >> >> static I2 unsafeCastI2(Object obj) { >> try { >> MethodHandle mh = MethodHandles.identity(Object.class); >> mh = MethodHandles.explicitCastArguments(mh, >> mh.type().changeReturnType(I2.class)); >> return (I2)mh.invokeExact((Object) obj); >> } catch (Throwable e) { >> throw new Error(e); >> } >> } >> >> In real world, how much meaningful we detect the error if we >> accidentally invoke a private interface method where we actually don't >> implement that interface. I think it's only possible via methodhandle >> and jasm, right? if we say it's undefined behavior, I think we can skip >> typecheck. It would make lambda code faster. > > I think there are potential security considerations here, but if you > want to make a case for change then email: > > jls-jvms-spec-comments at openjdk.java.net > > Cheers, > David > > >> thanks, >> --lx >> From forax at univ-mlv.fr Wed Nov 17 08:53:32 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 17 Nov 2021 09:53:32 +0100 (CET) Subject: Is it necessary to check subtype for invokeinterface of private method? In-Reply-To: <687da655-903b-b6b7-2b38-23e3cf722106@amazon.com> References: <3d8e3679-30db-ad9c-d68a-6184fd2d008c@amazon.com> <687da655-903b-b6b7-2b38-23e3cf722106@amazon.com> Message-ID: <1135408296.1620728.1637139212846.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "Liu, Xin" > To: "David Holmes" , "hotspot-dev" > Cc: "Simonis, Volker" , "Dean Long" > Sent: Mercredi 17 Novembre 2021 09:22:10 > Subject: Re: Is it necessary to check subtype for invokeinterface of private method? > Hi, David, > > Thanks you the head-up! Now I understand why c2 generates the checkcast > code for invokespecial and invokeinterface. It must conform to the JVM > spec. > > C1 does the checkcast for invokespecial right now. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_GraphBuilder.cpp#L1874 > > ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java is correct. The > ICCE is expected. I will implement invokeinterface by book. > > I don't intend the challenge the JVM spec. The private interface methods > is new for me. I double think about this. There's no polymorphism for > them, so c1/c2 can use relocInfo::opt_virtual_call_type to optimize the > callsites of them, but the typecheck still needs to be in place! Yes, the type check is due to the fact that the bytecode verifier does not verify if a value really implement an interface so a typecheck as to be inserted. Usually, it means that the world seen by the compiler and the world seen by the VM are not the same hence the IncompatibleClassChangeError. > > thanks, > --lx regards, R?mi > > > On 11/16/21 10:56 PM, David Holmes wrote: >> CAUTION: This email originated from outside of the organization. Do not click >> links or open attachments unless you can confirm the sender and know the >> content is safe. >> >> >> >> Hi Xin, >> >> On 16/11/2021 7:44 pm, Liu, Xin wrote: >>> Hi, >>> >>> I am working on the regex performance in JDK-8274983. Even though it >>> begins with regex, the problem boils down to invokeinterface to the >>> private interface methods. Before hidden class (JDK-8238358), lambda >>> meta factory generates invokespecial for them. Now it generates >>> invokeinterface instead. C1 doesn't recognize the new code pattern and >>> generates an ic virtual call for the callsite. If many classes all >>> implement a common interface, they trash the ic stub because the >>> concrete classes are different. InvokePrivateInterfaceMethod.java with >>> -XX:+TraceCallFixup can reveal this pathological slowness. >>> >>> Is it the intentional behavior of C1? I see that C2 actually generates >>> checkcast code sequence for this case. I would like to patch up C1 >>> because C1 plays an important role for the startup time. >> >> Can't comment on C1 issue. >> >>> I have a patch to let C1 treats invokeinterface private interface >>> methods as invokespecial. In other words, I treat the private interface >>> methods as effective final. It runs pretty well until I encounter the >>> regression ./test/jdk/java/lang/invoke/PrivateInterfaceCall.java. That >>> leads me to the second question. >>> >>> In my understanding, the code unsafeCastI2() is essentially a typecast >>> of a function pointer, isn't it? My take of "unsafe" in "unsafeCastI2" >>> is that its behavior undefined, then why we need to check ICCE here? >> >> "unsafe" means that the validity of the cast can't be checked >> immediately, but it will be checked when the result is actually used - >> it is not "undefined behaviour". The VM and MethodHandle specifications >> require the subtype check on the receiver: >> >> JVMS 6.5 invoke_interface - Run-time Exception: Otherwise, if the class >> of objectref does not implement the resolved interface, invokeinterface >> throws an IncompatibleClassChangeError. >> >>> System.out.println("ICCE PrivateInterfaceCall.invokeDirect D1"); >>> shouldThrowICCE(() -> >>> PrivateInterfaceCall.invokeDirect(unsafeCastI2(new D1()))) >>> >>> static I2 unsafeCastI2(Object obj) { >>> try { >>> MethodHandle mh = MethodHandles.identity(Object.class); >>> mh = MethodHandles.explicitCastArguments(mh, >>> mh.type().changeReturnType(I2.class)); >>> return (I2)mh.invokeExact((Object) obj); >>> } catch (Throwable e) { >>> throw new Error(e); >>> } >>> } >>> >>> In real world, how much meaningful we detect the error if we >>> accidentally invoke a private interface method where we actually don't >>> implement that interface. I think it's only possible via methodhandle >>> and jasm, right? if we say it's undefined behavior, I think we can skip >>> typecheck. It would make lambda code faster. >> >> I think there are potential security considerations here, but if you >> want to make a case for change then email: >> >> jls-jvms-spec-comments at openjdk.java.net >> >> Cheers, >> David >> >> >>> thanks, >>> --lx From duke at openjdk.java.net Wed Nov 17 09:30:45 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 17 Nov 2021 09:30:45 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 11:37:23 GMT, Andrew Haley wrote: >> for(int i = 0; i < LENGTH; i++) { >> c[i] = a[i] + 2; >> } >> >> For the case showed above, after superword optimization with SVE, >> without the patch, the vector add operation always has 2 z-reg inputs, >> like: >> mov z16.s, #2 >> add z17.s, z17.s, z16.s >> >> Considering sve has supported basic binary operations with immediate, >> this pattern could be further optimized to: >> add z16.s, z16.s, #2 >> >> To implement it, we added some new match rules and assembler rules in >> the aarch64 backend. We also made some extensions on immediate types >> and functions to keep backward compatible. >> >> With the patch, only these binary integer vector operations, +(add), >> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for >> the optimization. Other vector operations are not supported currently. >> >> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 >> CPU, no new failure. >> >> There is no obvious performance uplift but it can help remove one >> redundant mov instruction. > > I'd like you to split this patch into two parts, please. > First, please use the new functions such as `Assembler::operand_valid_for_logical_immediate(bool is32, uint64_t imm)` only for SVE, leaving the existing logic in `Assembler` entirely untouched. This will cause some duplication, but that's OK. We can review changes to merge functionality in a separate patch. This will be much easier. Hi @theRealAph , I rebased my patch and retested it internally. Can I have your review :)? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6115 From erik.osterlund at oracle.com Wed Nov 17 09:42:22 2021 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 17 Nov 2021 09:42:22 +0000 Subject: Questions about oop handling for Panama upcalls. In-Reply-To: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> Message-ID: Hi Jorn, So you have a jobject in the caller, resolve it, and then need to pass the oop around as an argument to the callee. Our current upcall stubs try to quack like an interpreter in many ways, so that it will look like an i-2-something call. I think you can either try to do the same quacking dance, to pass the oop to the callee, or alternatively the primary question for me seems to be who is the callee? You have a very fixed format for the call, which makes me suspect the callee is some kind of JDK internal code. Another way of dealing with this would be to pass the jobject as a long and just resolve it in the callee instead, if this is indeed JDK internal code. Then this becomes a problem that doesn't need to be solved at all. Just sanity checking. /Erik > -----Original Message----- > From: hotspot-dev On Behalf Of Jorn > Vernee > Sent: Tuesday, 16 November 2021 18:51 > To: hotspot-dev at openjdk.java.net > Subject: Questions about oop handling for Panama upcalls. > > Hi, > > For panama-foreign upcalls we spin our own upcall stubs that wrap a method > handle VM entry for the actual upcall. I want to make sure I have the oop > handling correct on this. > > We receive a list of arguments from native code (all primitives, so no oops to > handle there), and then prefix that list with a MethodHandle oop, before > calling into the MH's VM entry. The MH oop can be stored in three different > places: > > 1. The MH oop is stored in a global JNI handle, and then resolved right before > the upcall [1]. > 2. The MH oop is then stored in the first argument register j_rarg0 for the > call. > 3. During a deopt of the callee, the deoptimization code spills the receiver > (MH oop) into the frame of the upcall stub. (looks like the extending of the > frame that happens for instance in c2i adapters doesn't make room for the > receiver?). > > I don't think I need to do anything else for 1., but for 2. and 3. there is > currently no handling. I wanted to ask how those cases should be handled, if > at all. > > I think 2. could in theory be addressed by implementing > CodeBlob::preserve_callee_argument_oops. Though, it has been working > fine so far without this, so I'm wondering if this is even needed. Is the caller > or callee responsible for handling argument oops (seems to be caller, from > looking at CompiledMethod::preserve_callee_argument_oops)? > Or does the caller just handle the receiver if there is one (since deopt spills > that into the callers frame)? The oop offset is passed to an OopClosure in > CompiledArgumentOopFinder::handle_oop_offset as an oop* [2]. Does the > argument register get spilled somewhere and the oop needs to be patched > in place at that address (by the OopClosure)? Or is this just used to mark the > oop as alive? (in the latter case, the JNI global should be enough I think). > > I think 3. could be handled with an OopMap entry at the frame offset where > the receiver is spilled during a deopt of the callee? Should it be an oop or a > narrowOop, or does it depend on VM settings? FWIW, the deopt code > always seems to need a machine word (64-bits) to do the spilling, so I think > it's an oop? Do I need to zero out that part of the frame when allocating the > frame so that the GC doesn't mistake some garbage that's in there for an > oop? > > I have a POC patch here for reference [3], that implements the 2 things > above. This passes our test suite, but I'm not sure about the correctness. > Looking at what JNI does for upcalls [4], I don't see how e.g. the receiver > argument that is put on the stack is handled, or what happens when the > callee deopts (though I think it would just overwrite the value on the stack > that's there already, since JNI always seems to do interpreted calls, where > we do compiled calls).? But, JNI/the call stub might be special cased > elsewhere... > > Also, the oop is briefly stored in rscratch1 when resolving. I'm interested to > know when the GC can look at the frame and register state, especially with > concurrent GCs in mind. I'm assuming it's only during the call to the MH VM > entry (but the existence of frame::safe_for_sender makes me less sure)? > AFAIK the call counts as a safepoint (with oop map for it typically stored at > the return offset). At this safepoint, the oop can only be stored at one of the > 3 places listed at the start. > > Thanks, > Jorn > > [1] : > https://github.com/openjdk/panama-foreign/blob/foreign- > jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L416 > [2] : > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/fr > ame.cpp#L939-L946 > [3] : > https://github.com/openjdk/panama-foreign/compare/foreign- > memaccess+abi...JornVernee:Deopt_Crash > [4] : > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe > nerator_x86_64.cpp#L339 From aph at openjdk.java.net Wed Nov 17 09:57:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 09:57:39 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates [v5] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 03:53:58 GMT, Fei Gao wrote: >> for(int i = 0; i < LENGTH; i++) { >> c[i] = a[i] + 2; >> } >> >> For the case showed above, after superword optimization with SVE, >> without the patch, the vector add operation always has 2 z-reg inputs, >> like: >> mov z16.s, #2 >> add z17.s, z17.s, z16.s >> >> Considering sve has supported basic binary operations with immediate, >> this pattern could be further optimized to: >> add z16.s, z16.s, #2 >> >> To implement it, we added some new match rules and assembler rules in >> the aarch64 backend. We also made some extensions on immediate types >> and functions to keep backward compatible. >> >> With the patch, only these binary integer vector operations, +(add), >> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for >> the optimization. Other vector operations are not supported currently. >> >> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 >> CPU, no new failure. >> >> There is no obvious performance uplift but it can help remove one >> redundant mov instruction. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Regenerate the asmtest.out.h file for aarch64 after rebasing > > Change-Id: I1292449268c73c8f84cc3ffa7a4c859cf79058eb > - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026 > > Change-Id: I2004dc45f7f0ab44bc22b48083b185e7b3bd5eea > - Add some assertion lines for help functions > > Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c > - Split the original patch and leave the existing logic in Assembler entirely untouched > > Change-Id: If8ddcef07b15615d7dd0c3063c44d2b705fac6f7 > - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026 > > Change-Id: I52aa66d200b74ac312c5d40283b94854bc1142e6 > - 8274179: AArch64: Support SVE operations with encodable immediates > > for(int i = 0; i < LENGTH; i++) { > c[i] = a[i] + 2; > } > > For the case showed above, after superword optimization with SVE, > without the patch, the vector add operation always has 2 z-reg inputs, > like: > mov z16.s, #2 > add z17.s, z17.s, z16.s > > Considering sve has supported basic binary operations with immediate, > this pattern could be further optimized to: > add z16.s, z16.s, #2 > > To implement it, we added some new match rules and assembler rules in > the aarch64 backend. We also made some extensions on immediate types > and functions to keep backward compatible. > > With the patch, only these binary integer vector operations, +(add), > -(sub), &(and), |(orr), and ^(eor) with immediate are supported for > the optimization. Other vector operations are not supported currently. > > Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 > CPU, no new failure. > > There is no obvious performance uplift but it can help remove one > redundant mov instruction. > > Change-Id: Iaec40e362918118691083fb171cc4dff390b35a2 src/hotspot/cpu/aarch64/aarch64.ad line 2736: > 2734: if (is_vshift_con_pattern(n, m) || > 2735: (UseSVE > 0 && m->Opcode() == Op_VectorStoreMask && n->Opcode() == Op_StoreVector) || > 2736: is_vector_arith_imm_pattern(n, m)) { Indent this line. ------------- PR: https://git.openjdk.java.net/jdk/pull/6115 From aph at openjdk.java.net Wed Nov 17 10:01:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 10:01:37 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates [v5] In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 03:53:58 GMT, Fei Gao wrote: >> for(int i = 0; i < LENGTH; i++) { >> c[i] = a[i] + 2; >> } >> >> For the case showed above, after superword optimization with SVE, >> without the patch, the vector add operation always has 2 z-reg inputs, >> like: >> mov z16.s, #2 >> add z17.s, z17.s, z16.s >> >> Considering sve has supported basic binary operations with immediate, >> this pattern could be further optimized to: >> add z16.s, z16.s, #2 >> >> To implement it, we added some new match rules and assembler rules in >> the aarch64 backend. We also made some extensions on immediate types >> and functions to keep backward compatible. >> >> With the patch, only these binary integer vector operations, +(add), >> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for >> the optimization. Other vector operations are not supported currently. >> >> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 >> CPU, no new failure. >> >> There is no obvious performance uplift but it can help remove one >> redundant mov instruction. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Regenerate the asmtest.out.h file for aarch64 after rebasing > > Change-Id: I1292449268c73c8f84cc3ffa7a4c859cf79058eb > - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026 > > Change-Id: I2004dc45f7f0ab44bc22b48083b185e7b3bd5eea > - Add some assertion lines for help functions > > Change-Id: Ic9120902bd8f8a8ead2e3740435a40f35d21757c > - Split the original patch and leave the existing logic in Assembler entirely untouched > > Change-Id: If8ddcef07b15615d7dd0c3063c44d2b705fac6f7 > - Merge branch 'master' of github.com:fg1417/jdk into fg1417-20211026 > > Change-Id: I52aa66d200b74ac312c5d40283b94854bc1142e6 > - 8274179: AArch64: Support SVE operations with encodable immediates > > for(int i = 0; i < LENGTH; i++) { > c[i] = a[i] + 2; > } > > For the case showed above, after superword optimization with SVE, > without the patch, the vector add operation always has 2 z-reg inputs, > like: > mov z16.s, #2 > add z17.s, z17.s, z16.s > > Considering sve has supported basic binary operations with immediate, > this pattern could be further optimized to: > add z16.s, z16.s, #2 > > To implement it, we added some new match rules and assembler rules in > the aarch64 backend. We also made some extensions on immediate types > and functions to keep backward compatible. > > With the patch, only these binary integer vector operations, +(add), > -(sub), &(and), |(orr), and ^(eor) with immediate are supported for > the optimization. Other vector operations are not supported currently. > > Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 > CPU, no new failure. > > There is no obvious performance uplift but it can help remove one > redundant mov instruction. > > Change-Id: Iaec40e362918118691083fb171cc4dff390b35a2 Good job, well done. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6115 From aph-open at littlepinkcloud.com Wed Nov 17 10:02:34 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Wed, 17 Nov 2021 10:02:34 +0000 Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates [v5] In-Reply-To: References: Message-ID: <85cbec88-e5ea-85a8-d3b4-593d1a9d7778@littlepinkcloud.com> On 11/17/21 09:57, Andrew Haley wrote: >> 2734: if (is_vshift_con_pattern(n, m) || >> 2735: (UseSVE > 0 && m->Opcode() == Op_VectorStoreMask && n->Opcode() == Op_StoreVector) || >> 2736: is_vector_arith_imm_pattern(n, m)) { > Indent this line. Sorry, that was a mistake. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From duke at openjdk.java.net Wed Nov 17 10:09:38 2021 From: duke at openjdk.java.net (Fei Gao) Date: Wed, 17 Nov 2021 10:09:38 GMT Subject: RFR: 8274179: AArch64: Support SVE operations with encodable immediates In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 11:37:23 GMT, Andrew Haley wrote: >> for(int i = 0; i < LENGTH; i++) { >> c[i] = a[i] + 2; >> } >> >> For the case showed above, after superword optimization with SVE, >> without the patch, the vector add operation always has 2 z-reg inputs, >> like: >> mov z16.s, #2 >> add z17.s, z17.s, z16.s >> >> Considering sve has supported basic binary operations with immediate, >> this pattern could be further optimized to: >> add z16.s, z16.s, #2 >> >> To implement it, we added some new match rules and assembler rules in >> the aarch64 backend. We also made some extensions on immediate types >> and functions to keep backward compatible. >> >> With the patch, only these binary integer vector operations, +(add), >> -(sub), &(and), |(orr), and ^(eor) with immediate are supported for >> the optimization. Other vector operations are not supported currently. >> >> Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 >> CPU, no new failure. >> >> There is no obvious performance uplift but it can help remove one >> redundant mov instruction. > > I'd like you to split this patch into two parts, please. > First, please use the new functions such as `Assembler::operand_valid_for_logical_immediate(bool is32, uint64_t imm)` only for SVE, leaving the existing logic in `Assembler` entirely untouched. This will cause some duplication, but that's OK. We can review changes to merge functionality in a separate patch. This will be much easier. Thanks :) @theRealAph ------------- PR: https://git.openjdk.java.net/jdk/pull/6115 From thartmann at openjdk.java.net Wed Nov 17 11:48:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 11:48:49 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg Message-ID: Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? Thanks, Tobias ------------- Commit messages: - 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg Changes: https://git.openjdk.java.net/jdk/pull/6428/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6428&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275643 Stats: 51 lines in 2 files changed: 51 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6428.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6428/head:pull/6428 PR: https://git.openjdk.java.net/jdk/pull/6428 From duke at openjdk.java.net Wed Nov 17 12:04:34 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Nov 2021 12:04:34 GMT Subject: RFR: 8277137: Set OnSpinWaitInst default value to "isb" for Arm Neoverse N1 [v2] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Wed, 17 Nov 2021 07:39:29 GMT, Nick Gasson wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Explicitly set OnSpinWaitInstCount to 1 > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 206: > >> 204: } >> 205: >> 206: if (FLAG_IS_DEFAULT(OnSpinWaitInst) && FLAG_IS_DEFAULT(OnSpinWaitInstCount)) { > > Should these two be set independently? If I pass `-XX:OnSpinWaitInstCount=2` then `OnSpinWaitInst` will default to "none". Hi Nick, Thank you for reviewing the PR. > Should these two be set independently? I don't mind. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From chagedorn at openjdk.java.net Wed Nov 17 12:25:38 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 17 Nov 2021 12:25:38 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann wrote: > Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 > > The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 > > Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? > > Thanks, > Tobias That looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6428 From duke at openjdk.java.net Wed Nov 17 12:31:10 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Nov 2021 12:31:10 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: > One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. > > Testing: > - `make test TEST=gtest`: Passed > - `make run-test TEST=tier1`: Passed > - `make run-test TEST=tier2`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6415/files - new: https://git.openjdk.java.net/jdk/pull/6415/files/56258906..a9edcca6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6415&range=01-02 Stats: 9 lines in 2 files changed: 5 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6415.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6415/head:pull/6415 PR: https://git.openjdk.java.net/jdk/pull/6415 From duke at openjdk.java.net Wed Nov 17 12:31:12 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Nov 2021 12:31:12 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v2] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Wed, 17 Nov 2021 12:01:12 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 206: >> >>> 204: } >>> 205: >>> 206: if (FLAG_IS_DEFAULT(OnSpinWaitInst) && FLAG_IS_DEFAULT(OnSpinWaitInstCount)) { >> >> Should these two be set independently? If I pass `-XX:OnSpinWaitInstCount=2` then `OnSpinWaitInst` will default to "none". > > Hi Nick, > Thank you for reviewing the PR. > >> Should these two be set independently? > > I don't mind. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From thartmann at openjdk.java.net Wed Nov 17 12:42:37 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 17 Nov 2021 12:42:37 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann wrote: > Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 > > The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 > > Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? > > Thanks, > Tobias Thanks for the review, Christian! ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From aph at openjdk.java.net Wed Nov 17 13:47:46 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 13:47:46 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich wrote: >> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. >> >> Testing: >> - `make test TEST=gtest`: Passed >> - `make run-test TEST=tier1`: Passed >> - `make run-test TEST=tier2`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently Did we establish that this is the right default for Neoverse N1? I know that we've found a benchmark where it's a win, bit I'm not sure that's the same thing. On the other hand, do we know of possible cases where ISB makes things worse? ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From hseigel at openjdk.java.net Wed Nov 17 14:28:38 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 17 Nov 2021 14:28:38 GMT Subject: RFR: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore wrote: > The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread. Moved the bit to AccessFlags which has space and an atomic set operation. > Tested with tier1-6, 7-8 in progress. Looks Good! Thanks for doing this. Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6410 From jorn.vernee at oracle.com Wed Nov 17 14:48:37 2021 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Wed, 17 Nov 2021 15:48:37 +0100 Subject: Questions about oop handling for Panama upcalls. In-Reply-To: References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> Message-ID: <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com> Hi Erik, Thanks for the suggestion. The callee is a mix of JDK internal and user code. The user gives us a method handle that they want to turn into a native function pointer [1], and we adapt that using method handle combinators [2] to take only primitve arguments according to the registers in which the native calling convention passes arguments (essentially each primitive argument is a register value). The register values are then reconstructed into high-level arguments (through our MH adaptation), and passed to the user code. It's this adapted method handle that we call from the upcall stub. I guess what you're suggesting is that we have some internal Java method like this: ??? static ... invoke(long methodHandle, ...) { ??????? MethodHandle mh = resolveJObject(methodHandle); ??????? return (...) mh.invokeExact(...); ??? } Which is then called from the upcall stub instead. I think it could work maybe (would have to see how the performance works out), but we have to deal with different signatures, so would have to use bytecode spinning to generate these 'invoke' methods on demand, which seems like maybe it's a worse medicine (in terms of complexity) than adding the correct oop handling in the VM. I would also just like to get a better understanding of how this is supposed to work in the first place (or how it works e.g. in the case of nmethods), since I had to implement the correct oop handling in the past as well when implementing the intrinsics for down calls, and it's probably not the last time I have to deal with something like this... > Our current upcall stubs try to quack like an interpreter in many ways, so that it will look like an i-2-something call. I think you can either try to do the same quacking dance, to pass the oop to the callee So, I suppose interpreter argument oops are handled through another mechanism than OopMaps, maybe something similar to CompiledMethod::preserve_callee_argument_oops? Thanks, Jorn [1] : https://github.com/openjdk/panama-foreign/blob/foreign-jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLinker.java#L224 [2] : https://github.com/openjdk/panama-foreign/blob/foreign-jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/ProgrammableUpcallHandler.java#L157 On 17/11/2021 10:42, Erik Osterlund wrote: > Hi Jorn, > > So you have a jobject in the caller, resolve it, and then need to pass the oop around as an argument to the callee. Our current upcall stubs try to quack like an interpreter in many ways, so that it will look like an i-2-something call. I think you can either try to do the same quacking dance, to pass the oop to the callee, or alternatively the primary question for me seems to be who is the callee? You have a very fixed format for the call, which makes me suspect the callee is some kind of JDK internal code. Another way of dealing with this would be to pass the jobject as a long and just resolve it in the callee instead, if this is indeed JDK internal code. Then this becomes a problem that doesn't need to be solved at all. Just sanity checking. > > /Erik > >> -----Original Message----- >> From: hotspot-dev On Behalf Of Jorn >> Vernee >> Sent: Tuesday, 16 November 2021 18:51 >> To:hotspot-dev at openjdk.java.net >> Subject: Questions about oop handling for Panama upcalls. >> >> Hi, >> >> For panama-foreign upcalls we spin our own upcall stubs that wrap a method >> handle VM entry for the actual upcall. I want to make sure I have the oop >> handling correct on this. >> >> We receive a list of arguments from native code (all primitives, so no oops to >> handle there), and then prefix that list with a MethodHandle oop, before >> calling into the MH's VM entry. The MH oop can be stored in three different >> places: >> >> 1. The MH oop is stored in a global JNI handle, and then resolved right before >> the upcall [1]. >> 2. The MH oop is then stored in the first argument register j_rarg0 for the >> call. >> 3. During a deopt of the callee, the deoptimization code spills the receiver >> (MH oop) into the frame of the upcall stub. (looks like the extending of the >> frame that happens for instance in c2i adapters doesn't make room for the >> receiver?). >> >> I don't think I need to do anything else for 1., but for 2. and 3. there is >> currently no handling. I wanted to ask how those cases should be handled, if >> at all. >> >> I think 2. could in theory be addressed by implementing >> CodeBlob::preserve_callee_argument_oops. Though, it has been working >> fine so far without this, so I'm wondering if this is even needed. Is the caller >> or callee responsible for handling argument oops (seems to be caller, from >> looking at CompiledMethod::preserve_callee_argument_oops)? >> Or does the caller just handle the receiver if there is one (since deopt spills >> that into the callers frame)? The oop offset is passed to an OopClosure in >> CompiledArgumentOopFinder::handle_oop_offset as an oop* [2]. Does the >> argument register get spilled somewhere and the oop needs to be patched >> in place at that address (by the OopClosure)? Or is this just used to mark the >> oop as alive? (in the latter case, the JNI global should be enough I think). >> >> I think 3. could be handled with an OopMap entry at the frame offset where >> the receiver is spilled during a deopt of the callee? Should it be an oop or a >> narrowOop, or does it depend on VM settings? FWIW, the deopt code >> always seems to need a machine word (64-bits) to do the spilling, so I think >> it's an oop? Do I need to zero out that part of the frame when allocating the >> frame so that the GC doesn't mistake some garbage that's in there for an >> oop? >> >> I have a POC patch here for reference [3], that implements the 2 things >> above. This passes our test suite, but I'm not sure about the correctness. >> Looking at what JNI does for upcalls [4], I don't see how e.g. the receiver >> argument that is put on the stack is handled, or what happens when the >> callee deopts (though I think it would just overwrite the value on the stack >> that's there already, since JNI always seems to do interpreted calls, where >> we do compiled calls).? But, JNI/the call stub might be special cased >> elsewhere... >> >> Also, the oop is briefly stored in rscratch1 when resolving. I'm interested to >> know when the GC can look at the frame and register state, especially with >> concurrent GCs in mind. I'm assuming it's only during the call to the MH VM >> entry (but the existence of frame::safe_for_sender makes me less sure)? >> AFAIK the call counts as a safepoint (with oop map for it typically stored at >> the return offset). At this safepoint, the oop can only be stored at one of the >> 3 places listed at the start. >> >> Thanks, >> Jorn >> >> [1] : >> https://github.com/openjdk/panama-foreign/blob/foreign- >> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L416 >> [2] : >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/fr >> ame.cpp#L939-L946 >> [3] : >> https://github.com/openjdk/panama-foreign/compare/foreign- >> memaccess+abi...JornVernee:Deopt_Crash >> [4] : >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe >> nerator_x86_64.cpp#L339 From erik.osterlund at oracle.com Wed Nov 17 15:14:06 2021 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Wed, 17 Nov 2021 15:14:06 +0000 Subject: Questions about oop handling for Panama upcalls. In-Reply-To: <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com> References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com> Message-ID: Hi Jorn, In the interpreter world, the expression stack at the call site becomes the locals of the callee. So everything is passed through the stack. So the upcall stub sets things up like an interpreter method would have (quack quack), and calls the i2c adapter if there is an nmethod (quack quack), which will transform the arguments to the compiled convention of the callee. The argument ownership then switches from the caller to the callee, once the callee can manifest on the stack. But if there are safepoints inbetween, then the caller owns the arguments until its callee manifests. Do you want to avoid the pretend to be the interpreter step because it is costly in the Panama world to spill arguments to the stack? /Erik > -----Original Message----- > From: Jorn Vernee > Sent: Wednesday, 17 November 2021 15:49 > To: Erik Osterlund ; hotspot- > dev at openjdk.java.net > Subject: Re: Questions about oop handling for Panama upcalls. > > Hi Erik, > > Thanks for the suggestion. > > The callee is a mix of JDK internal and user code. The user gives us a method > handle that they want to turn into a native function pointer [1], and we adapt > that using method handle combinators [2] to take only primitve arguments > according to the registers in which the native calling convention passes > arguments (essentially each primitive argument is a register value). The > register values are then reconstructed into high-level arguments (through > our MH adaptation), and passed to the user code. It's this adapted method > handle that we call from the upcall stub. > > I guess what you're suggesting is that we have some internal Java method > like this: > > ??? static ... invoke(long methodHandle, ...) { > ??????? MethodHandle mh = resolveJObject(methodHandle); > ??????? return (...) mh.invokeExact(...); > ??? } > > Which is then called from the upcall stub instead. > > I think it could work maybe (would have to see how the performance works > out), but we have to deal with different signatures, so would have to use > bytecode spinning to generate these 'invoke' methods on demand, which > seems like maybe it's a worse medicine (in terms of complexity) than adding > the correct oop handling in the VM. > > I would also just like to get a better understanding of how this is supposed to > work in the first place (or how it works e.g. in the case of nmethods), since I > had to implement the correct oop handling in the past as well when > implementing the intrinsics for down calls, and it's probably not the last time I > have to deal with something like this... > > > Our current upcall stubs try to quack like an interpreter in many ways, so > that it will look like an i-2-something call. I think you can either try to do the > same quacking dance, to pass the oop to the callee > > So, I suppose interpreter argument oops are handled through another > mechanism than OopMaps, maybe something similar to > CompiledMethod::preserve_callee_argument_oops? > > Thanks, > Jorn > > [1] : > https://github.com/openjdk/panama-foreign/blob/foreign- > jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink > er.java#L224 > [2] : > https://github.com/openjdk/panama-foreign/blob/foreign- > jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr > ogrammableUpcallHandler.java#L157 > > On 17/11/2021 10:42, Erik Osterlund wrote: > > Hi Jorn, > > > > So you have a jobject in the caller, resolve it, and then need to pass the > oop around as an argument to the callee. Our current upcall stubs try to > quack like an interpreter in many ways, so that it will look like an i-2- > something call. I think you can either try to do the same quacking dance, to > pass the oop to the callee, or alternatively the primary question for me > seems to be who is the callee? You have a very fixed format for the call, > which makes me suspect the callee is some kind of JDK internal code. > Another way of dealing with this would be to pass the jobject as a long and > just resolve it in the callee instead, if this is indeed JDK internal code. Then > this becomes a problem that doesn't need to be solved at all. Just sanity > checking. > > > > /Erik > > > >> -----Original Message----- > >> From: hotspot-dev On Behalf Of > >> Jorn Vernee > >> Sent: Tuesday, 16 November 2021 18:51 To:hotspot- > dev at openjdk.java.net > >> Subject: Questions about oop handling for Panama upcalls. > >> > >> Hi, > >> > >> For panama-foreign upcalls we spin our own upcall stubs that wrap a > >> method handle VM entry for the actual upcall. I want to make sure I > >> have the oop handling correct on this. > >> > >> We receive a list of arguments from native code (all primitives, so > >> no oops to handle there), and then prefix that list with a > >> MethodHandle oop, before calling into the MH's VM entry. The MH oop > >> can be stored in three different > >> places: > >> > >> 1. The MH oop is stored in a global JNI handle, and then resolved > >> right before the upcall [1]. > >> 2. The MH oop is then stored in the first argument register j_rarg0 > >> for the call. > >> 3. During a deopt of the callee, the deoptimization code spills the > >> receiver (MH oop) into the frame of the upcall stub. (looks like the > >> extending of the frame that happens for instance in c2i adapters > >> doesn't make room for the receiver?). > >> > >> I don't think I need to do anything else for 1., but for 2. and 3. > >> there is currently no handling. I wanted to ask how those cases > >> should be handled, if at all. > >> > >> I think 2. could in theory be addressed by implementing > >> CodeBlob::preserve_callee_argument_oops. Though, it has been working > >> fine so far without this, so I'm wondering if this is even needed. Is > >> the caller or callee responsible for handling argument oops (seems to > >> be caller, from looking at > CompiledMethod::preserve_callee_argument_oops)? > >> Or does the caller just handle the receiver if there is one (since > >> deopt spills that into the callers frame)? The oop offset is passed > >> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as > >> an oop* [2]. Does the argument register get spilled somewhere and the > >> oop needs to be patched in place at that address (by the OopClosure)? > >> Or is this just used to mark the oop as alive? (in the latter case, the JNI > global should be enough I think). > >> > >> I think 3. could be handled with an OopMap entry at the frame offset > >> where the receiver is spilled during a deopt of the callee? Should it > >> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the > >> deopt code always seems to need a machine word (64-bits) to do the > >> spilling, so I think it's an oop? Do I need to zero out that part of > >> the frame when allocating the frame so that the GC doesn't mistake > >> some garbage that's in there for an oop? > >> > >> I have a POC patch here for reference [3], that implements the 2 > >> things above. This passes our test suite, but I'm not sure about the > correctness. > >> Looking at what JNI does for upcalls [4], I don't see how e.g. the > >> receiver argument that is put on the stack is handled, or what > >> happens when the callee deopts (though I think it would just > >> overwrite the value on the stack that's there already, since JNI > >> always seems to do interpreted calls, where we do compiled calls). > >> But, JNI/the call stub might be special cased elsewhere... > >> > >> Also, the oop is briefly stored in rscratch1 when resolving. I'm > >> interested to know when the GC can look at the frame and register > >> state, especially with concurrent GCs in mind. I'm assuming it's only > >> during the call to the MH VM entry (but the existence of > frame::safe_for_sender makes me less sure)? > >> AFAIK the call counts as a safepoint (with oop map for it typically > >> stored at the return offset). At this safepoint, the oop can only be > >> stored at one of the > >> 3 places listed at the start. > >> > >> Thanks, > >> Jorn > >> > >> [1] : > >> https://github.com/openjdk/panama-foreign/blob/foreign- > >> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L > >> 416 > >> [2] : > >> > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/ > >> fr > >> ame.cpp#L939-L946 > >> [3] : > >> https://github.com/openjdk/panama-foreign/compare/foreign- > >> memaccess+abi...JornVernee:Deopt_Crash > >> [4] : > >> > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe > >> nerator_x86_64.cpp#L339 From jorn.vernee at oracle.com Wed Nov 17 15:35:16 2021 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Wed, 17 Nov 2021 16:35:16 +0100 Subject: Questions about oop handling for Panama upcalls. In-Reply-To: References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com> Message-ID: <89b42995-1504-d3cc-1d37-595610b75801@oracle.com> On 17/11/2021 16:14, Erik Osterlund wrote: > Hi Jorn, > > In the interpreter world, the expression stack at the call site becomes the locals > of the callee. So everything is passed through the stack. So the upcall stub sets > things up like an interpreter method would have (quack quack), and calls the > i2c adapter if there is an nmethod (quack quack), which will transform the > arguments to the compiled convention of the callee. The argument ownership > then switches from the caller to the callee, once the callee can manifest on the > stack. But if there are safepoints inbetween, then the caller owns the arguments > until its callee manifests. Okay, thanks, that makes sense. This probably explains why not implementing preserve_callee_argument_oops for the upcall stubs didn't cause any problems so far. There probably just weren't any safepoints in between the call from the stub and the callee setting up it's frame. (although I'm still a bit confused here why the callee doesn't make space for the receiver in it's frame as well). > Do you want to avoid the pretend to be the interpreter step because it is costly > in the Panama world to spill arguments to the stack? I think either one could "work", although it seems like interpreter calls require more setup of meta data around calls (which would be unneeded if we called into an nmethod I think?). Also, we generate an argument shuffle from the native convention to the Java calling convention (this is unavoidable). If the native convention passes arguments in the same registers that the Java convention expects them in we don't have to generate code for that in the shuffle. Theoretically we could also do a pass to minimize the needed shuffle by reordering parameters on the MethodHandle. If we went with an interpreted calling convention, we would always have to copy across arguments to the stack, in a shuffle-ish manner (right now we rely on SharedRuntime::java_calling_convention to compute the target registers. Would have to implement something similar for the interpreter convention). It seems to me that in the long run, going with the Java compiled calling convention for the upcall is the right choice if we want to be able to squeeze out as much speed as possible. Jorn > > /Erik > >> -----Original Message----- >> From: Jorn Vernee >> Sent: Wednesday, 17 November 2021 15:49 >> To: Erik Osterlund ; hotspot- >> dev at openjdk.java.net >> Subject: Re: Questions about oop handling for Panama upcalls. >> >> Hi Erik, >> >> Thanks for the suggestion. >> >> The callee is a mix of JDK internal and user code. The user gives us a method >> handle that they want to turn into a native function pointer [1], and we adapt >> that using method handle combinators [2] to take only primitve arguments >> according to the registers in which the native calling convention passes >> arguments (essentially each primitive argument is a register value). The >> register values are then reconstructed into high-level arguments (through >> our MH adaptation), and passed to the user code. It's this adapted method >> handle that we call from the upcall stub. >> >> I guess what you're suggesting is that we have some internal Java method >> like this: >> >> ??? static ... invoke(long methodHandle, ...) { >> ??????? MethodHandle mh = resolveJObject(methodHandle); >> ??????? return (...) mh.invokeExact(...); >> ??? } >> >> Which is then called from the upcall stub instead. >> >> I think it could work maybe (would have to see how the performance works >> out), but we have to deal with different signatures, so would have to use >> bytecode spinning to generate these 'invoke' methods on demand, which >> seems like maybe it's a worse medicine (in terms of complexity) than adding >> the correct oop handling in the VM. >> >> I would also just like to get a better understanding of how this is supposed to >> work in the first place (or how it works e.g. in the case of nmethods), since I >> had to implement the correct oop handling in the past as well when >> implementing the intrinsics for down calls, and it's probably not the last time I >> have to deal with something like this... >> >> > Our current upcall stubs try to quack like an interpreter in many ways, so >> that it will look like an i-2-something call. I think you can either try to do the >> same quacking dance, to pass the oop to the callee >> >> So, I suppose interpreter argument oops are handled through another >> mechanism than OopMaps, maybe something similar to >> CompiledMethod::preserve_callee_argument_oops? >> >> Thanks, >> Jorn >> >> [1] : >> https://github.com/openjdk/panama-foreign/blob/foreign- >> jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink >> er.java#L224 >> [2] : >> https://github.com/openjdk/panama-foreign/blob/foreign- >> jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr >> ogrammableUpcallHandler.java#L157 >> >> On 17/11/2021 10:42, Erik Osterlund wrote: >>> Hi Jorn, >>> >>> So you have a jobject in the caller, resolve it, and then need to pass the >> oop around as an argument to the callee. Our current upcall stubs try to >> quack like an interpreter in many ways, so that it will look like an i-2- >> something call. I think you can either try to do the same quacking dance, to >> pass the oop to the callee, or alternatively the primary question for me >> seems to be who is the callee? You have a very fixed format for the call, >> which makes me suspect the callee is some kind of JDK internal code. >> Another way of dealing with this would be to pass the jobject as a long and >> just resolve it in the callee instead, if this is indeed JDK internal code. Then >> this becomes a problem that doesn't need to be solved at all. Just sanity >> checking. >>> /Erik >>> >>>> -----Original Message----- >>>> From: hotspot-dev On Behalf Of >>>> Jorn Vernee >>>> Sent: Tuesday, 16 November 2021 18:51 To:hotspot- >> dev at openjdk.java.net >>>> Subject: Questions about oop handling for Panama upcalls. >>>> >>>> Hi, >>>> >>>> For panama-foreign upcalls we spin our own upcall stubs that wrap a >>>> method handle VM entry for the actual upcall. I want to make sure I >>>> have the oop handling correct on this. >>>> >>>> We receive a list of arguments from native code (all primitives, so >>>> no oops to handle there), and then prefix that list with a >>>> MethodHandle oop, before calling into the MH's VM entry. The MH oop >>>> can be stored in three different >>>> places: >>>> >>>> 1. The MH oop is stored in a global JNI handle, and then resolved >>>> right before the upcall [1]. >>>> 2. The MH oop is then stored in the first argument register j_rarg0 >>>> for the call. >>>> 3. During a deopt of the callee, the deoptimization code spills the >>>> receiver (MH oop) into the frame of the upcall stub. (looks like the >>>> extending of the frame that happens for instance in c2i adapters >>>> doesn't make room for the receiver?). >>>> >>>> I don't think I need to do anything else for 1., but for 2. and 3. >>>> there is currently no handling. I wanted to ask how those cases >>>> should be handled, if at all. >>>> >>>> I think 2. could in theory be addressed by implementing >>>> CodeBlob::preserve_callee_argument_oops. Though, it has been working >>>> fine so far without this, so I'm wondering if this is even needed. Is >>>> the caller or callee responsible for handling argument oops (seems to >>>> be caller, from looking at >> CompiledMethod::preserve_callee_argument_oops)? >>>> Or does the caller just handle the receiver if there is one (since >>>> deopt spills that into the callers frame)? The oop offset is passed >>>> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as >>>> an oop* [2]. Does the argument register get spilled somewhere and the >>>> oop needs to be patched in place at that address (by the OopClosure)? >>>> Or is this just used to mark the oop as alive? (in the latter case, the JNI >> global should be enough I think). >>>> I think 3. could be handled with an OopMap entry at the frame offset >>>> where the receiver is spilled during a deopt of the callee? Should it >>>> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the >>>> deopt code always seems to need a machine word (64-bits) to do the >>>> spilling, so I think it's an oop? Do I need to zero out that part of >>>> the frame when allocating the frame so that the GC doesn't mistake >>>> some garbage that's in there for an oop? >>>> >>>> I have a POC patch here for reference [3], that implements the 2 >>>> things above. This passes our test suite, but I'm not sure about the >> correctness. >>>> Looking at what JNI does for upcalls [4], I don't see how e.g. the >>>> receiver argument that is put on the stack is handled, or what >>>> happens when the callee deopts (though I think it would just >>>> overwrite the value on the stack that's there already, since JNI >>>> always seems to do interpreted calls, where we do compiled calls). >>>> But, JNI/the call stub might be special cased elsewhere... >>>> >>>> Also, the oop is briefly stored in rscratch1 when resolving. I'm >>>> interested to know when the GC can look at the frame and register >>>> state, especially with concurrent GCs in mind. I'm assuming it's only >>>> during the call to the MH VM entry (but the existence of >> frame::safe_for_sender makes me less sure)? >>>> AFAIK the call counts as a safepoint (with oop map for it typically >>>> stored at the return offset). At this safepoint, the oop can only be >>>> stored at one of the >>>> 3 places listed at the start. >>>> >>>> Thanks, >>>> Jorn >>>> >>>> [1] : >>>> https://github.com/openjdk/panama-foreign/blob/foreign- >>>> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L >>>> 416 >>>> [2] : >>>> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/ >>>> fr >>>> ame.cpp#L939-L946 >>>> [3] : >>>> https://github.com/openjdk/panama-foreign/compare/foreign- >>>> memaccess+abi...JornVernee:Deopt_Crash >>>> [4] : >>>> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe >>>> nerator_x86_64.cpp#L339 From shade at openjdk.java.net Wed Nov 17 15:40:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 17 Nov 2021 15:40:36 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v3] In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 18:03:00 GMT, Serguei Spitsyn wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> More reviews > > Marked as reviewed by sspitsyn (Reviewer). > Thank you, @sspitsyn! Any more reviews, anyone? No other reviews? I'd like to integrate this soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From duke at openjdk.java.net Wed Nov 17 15:46:38 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Nov 2021 15:46:38 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: <6dPmyx5EBbz9tN_rgpgcCx6u7v5CJsswOsB0qpEkDKY=.4e4d5f98-df5b-47df-9885-f4cdc84a48d3@github.com> On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich wrote: >> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. >> >> Testing: >> - `make test TEST=gtest`: Passed >> - `make run-test TEST=tier1`: Passed >> - `make run-test TEST=tier2`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently Hi Andrew, Thank you for reviewing. > Did we establish that this is the right default for Neoverse N1? This is based on: - MySql: https://bugs.mysql.com/bug.php?id=100664 - MongoDB: https://jira.mongodb.org/browse/WT-6872 - Netty: https://github.com/netty/netty/pull/11677 - Customers' benchmarks and workloads. - Experiments with two and three `ISB` instructions. > On the other hand, do we know of possible cases where ISB makes things worse? `Thread.onSpinWait` makes things worse when synchronisation overhead is not on the critical path. It might not improve performance when there is thread contention. In this case it might not give CPU resources to another thread. This applies to both arm64 and x86_64. For example, my x86 system: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Stepping: 7 CPU MHz: 3097.588 BogoMIPS: 4999.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K Results of `org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` with 4 threads running on 2 vCPUs: - `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_onSpinWait -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units ThreadOnSpinWaitSharedCounter.trial 1000000 4 avgt 15 45.317 ? 1.741 ms/op - `taskset -c 0-1 build/linux-x86_64-server-release/images/jdk/bin/java -jar build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -f 3 org.openjdk.bench.java.lang.ThreadOnSpinWaitSharedCounter` Benchmark (maxNum) (threadCount) Mode Cnt Score Error Units ThreadOnSpinWaitSharedCounter.trial 1000000 4 avgt 15 55.530 ? 4.606 ms/op X86 `PAUSE` based implementation causes 22.5% slowdown. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From aph at openjdk.java.net Wed Nov 17 16:36:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Nov 2021 16:36:36 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: <3zBHQL8aaLzwDww80QuHhjFrRvXrwoQIwvJkQeOvUFs=.ed8bdf82-c311-4519-ab08-c447733bdd5f@github.com> On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich wrote: >> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. >> >> Testing: >> - `make test TEST=gtest`: Passed >> - `make run-test TEST=tier1`: Passed >> - `make run-test TEST=tier2`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently Marked as reviewed by aph (Reviewer). Hi, > > Did we establish that this is the right default for Neoverse N1? > > This is based on: > > * MySql: https://bugs.mysql.com/bug.php?id=100664 > > * MongoDB: https://jira.mongodb.org/browse/WT-6872 > > * Netty: [Use cpu_relax() implementation for aarch64 netty/netty#11677](https://github.com/netty/netty/pull/11677) > > * Customers' benchmarks and workloads. > > * Experiments with two and three `ISB` instructions. OK, I'll buy that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From phh at openjdk.java.net Wed Nov 17 16:54:40 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 17 Nov 2021 16:54:40 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich wrote: >> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. >> >> Testing: >> - `make test TEST=gtest`: Passed >> - `make run-test TEST=tier1`: Passed >> - `make run-test TEST=tier2`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6415 From duke at openjdk.java.net Wed Nov 17 16:54:41 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Nov 2021 16:54:41 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: <3zBHQL8aaLzwDww80QuHhjFrRvXrwoQIwvJkQeOvUFs=.ed8bdf82-c311-4519-ab08-c447733bdd5f@github.com> References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> <3zBHQL8aaLzwDww80QuHhjFrRvXrwoQIwvJkQeOvUFs=.ed8bdf82-c311-4519-ab08-c447733bdd5f@github.com> Message-ID: On Wed, 17 Nov 2021 16:32:45 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently > > Hi, > >> > Did we establish that this is the right default for Neoverse N1? >> >> This is based on: >> >> * MySql: https://bugs.mysql.com/bug.php?id=100664 >> >> * MongoDB: https://jira.mongodb.org/browse/WT-6872 >> >> * Netty: [Use cpu_relax() implementation for aarch64 netty/netty#11677](https://github.com/netty/netty/pull/11677) >> >> * Customers' benchmarks and workloads. >> >> * Experiments with two and three `ISB` instructions. > > OK, I'll buy that. @theRealAph Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From duke at openjdk.java.net Wed Nov 17 16:54:41 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Nov 2021 16:54:41 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: <54L9vtPkUc1djo2iQLpaZBjGJpwe4cxaCbrulq5TC7o=.a3cfb114-dc74-45c6-8b69-cfc99813e3f1@github.com> On Wed, 17 Nov 2021 16:50:43 GMT, Paul Hohensee wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently > > Lgtm. @phohensee Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From sspitsyn at openjdk.java.net Wed Nov 17 18:00:40 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 17 Nov 2021 18:00:40 GMT Subject: RFR: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore wrote: > The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread. Moved the bit to AccessFlags which has space and an atomic set operation. > Tested with tier1-6, 7-8 in progress. Hi Coleen, Great discovery! The fix looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6410 From coleenp at openjdk.java.net Wed Nov 17 19:57:47 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 17 Nov 2021 19:57:47 GMT Subject: RFR: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore wrote: > The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread. Moved the bit to AccessFlags which has space and an atomic set operation. > Tested with tier1-6, 7-8 in progress. Thank you Serguei and Harold. ------------- PR: https://git.openjdk.java.net/jdk/pull/6410 From coleenp at openjdk.java.net Wed Nov 17 19:57:48 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 17 Nov 2021 19:57:48 GMT Subject: Integrated: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore wrote: > The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread. Moved the bit to AccessFlags which has space and an atomic set operation. > Tested with tier1-6, 7-8 in progress. This pull request has now been integrated. Changeset: a907b2b1 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/a907b2b144f2af27392eb7c2f9656fbb1a759618 Stats: 13 lines in 3 files changed: 7 ins; 2 del; 4 mod 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" Reviewed-by: hseigel, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/6410 From sspitsyn at openjdk.java.net Wed Nov 17 22:37:01 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 17 Nov 2021 22:37:01 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" Message-ID: The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. The fix is to add a check for is_exiting() status into handshake closure do_thread() early. There following handshake closures are fixed by this update: - UpdateForPopTopFrameClosure - SetForceEarlyReturn - SetFramePopClosure ------------- Commit messages: - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert Changes: https://git.openjdk.java.net/jdk/pull/6440/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8266593 Stats: 22 lines in 2 files changed: 10 ins; 6 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6440.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440 PR: https://git.openjdk.java.net/jdk/pull/6440 From sviswanathan at openjdk.java.net Wed Nov 17 22:57:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 17 Nov 2021 22:57:41 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann wrote: > Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 > > The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 > > Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? > > Thanks, > Tobias Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6428 From mdoerr at openjdk.java.net Wed Nov 17 23:02:48 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 17 Nov 2021 23:02:48 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" In-Reply-To: References: Message-ID: <9_lGsNSJueWi-Q6czgJUI8Ps9RuSK4sW8w4HO8uPfHU=.74eee17c-0d7e-4811-a6c6-fe90f60abd09@github.com> On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn wrote: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure LGTM. Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6440 From dlong at openjdk.java.net Thu Nov 18 00:26:37 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 18 Nov 2021 00:26:37 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund wrote: > When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. > The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. Looks good! ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6405 From lmesnik at openjdk.java.net Thu Nov 18 00:35:41 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Thu, 18 Nov 2021 00:35:41 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn wrote: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From ngasson at openjdk.java.net Thu Nov 18 01:31:48 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 18 Nov 2021 01:31:48 GMT Subject: RFR: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 [v3] In-Reply-To: References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: On Wed, 17 Nov 2021 12:31:10 GMT, Evgeny Astigeevich wrote: >> One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. >> >> Testing: >> - `make test TEST=gtest`: Passed >> - `make run-test TEST=tier1`: Passed >> - `make run-test TEST=tier2`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Set defaults for OnSpinWaitInst/OnSpinWaitInstCount independently Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From smarks at openjdk.java.net Thu Nov 18 01:51:07 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 01:51:07 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization Message-ID: Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. ------------- Commit messages: - extraneous newline - Merge branch 'master' into JDK-8276422-disable-finalization-option - Simplify InvalidFinalizationOption test. - Change InvalidFinalizationOption test to driver mode. - Revert extraneous whitespace change to globals.hpp. - Renaming within the test class itself. - Rename invalid finalization option test. - Add test for invalid finalization option syntax or value. - Add @bug line to JFR finalization event test. - Test that no jdk.FinalizationStatistics events are generated when finalization is disabled - ... and 7 more: https://git.openjdk.java.net/jdk/compare/29e552c0...3836cc94 Changes: https://git.openjdk.java.net/jdk/pull/6442/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276422 Stats: 266 lines in 13 files changed: 249 ins; 0 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/6442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442 PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Thu Nov 18 01:59:41 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 01:59:41 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks wrote: > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. Hi Stuart, This all looks fine to me. The hotspot part needs a second reviewer (especially as I contributed a chunk of that code :) ). Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6442 From duke at openjdk.java.net Thu Nov 18 02:44:45 2021 From: duke at openjdk.java.net (Fei Gao) Date: Thu, 18 Nov 2021 02:44:45 GMT Subject: Integrated: 8274179: AArch64: Support SVE operations with encodable immediates In-Reply-To: References: Message-ID: On Tue, 26 Oct 2021 01:58:40 GMT, Fei Gao wrote: > for(int i = 0; i < LENGTH; i++) { > c[i] = a[i] + 2; > } > > For the case showed above, after superword optimization with SVE, > without the patch, the vector add operation always has 2 z-reg inputs, > like: > mov z16.s, #2 > add z17.s, z17.s, z16.s > > Considering sve has supported basic binary operations with immediate, > this pattern could be further optimized to: > add z16.s, z16.s, #2 > > To implement it, we added some new match rules and assembler rules in > the aarch64 backend. We also made some extensions on immediate types > and functions to keep backward compatible. > > With the patch, only these binary integer vector operations, +(add), > -(sub), &(and), |(orr), and ^(eor) with immediate are supported for > the optimization. Other vector operations are not supported currently. > > Tested tier1 and test/hotspot/jtreg/compiler on SVE featured AArch64 > CPU, no new failure. > > There is no obvious performance uplift but it can help remove one > redundant mov instruction. This pull request has now been integrated. Changeset: 81938001 Author: Fei Gao Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/81938001f9bae56c59f4e18b7756089f2cf0bf74 Stats: 1476 lines in 12 files changed: 1329 ins; 43 del; 104 mod 8274179: AArch64: Support SVE operations with encodable immediates Reviewed-by: aph, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/6115 From pli at openjdk.java.net Thu Nov 18 04:03:54 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Thu, 18 Nov 2021 04:03:54 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE Message-ID: Arraycopy partial inlining is a C2 compiler technique that avoids stub call overhead in small-sized arraycopy operations by generating masked vector instructions. So far it works on x86 AVX512 only and this patch enables it on AArch64 with SVE. We add AArch64 matching rule for VectorMaskGenNode and refactor that node a little bit. The major change is moving the element type field into its TypeVectMask bottom type. The reason is that AArch64 vector masks are different for different vector element types. E.g., an x86 AVX512 vector mask value masking 3 least significant vector lanes (of any type) is like `0000 0000 ... 0000 0000 0000 0000 0111` On AArch64 SVE, this mask value can only be used for masking the 3 least significant lanes of bytes. But for 3 lanes of ints, the value should be `0000 0000 ... 0000 0000 0001 0001 0001` where the least significant bit of each lane matters. So AArch64 matcher needs to know the vector element type to generate right masks. After this patch, the C2 generated code for copying a 50-byte array on AArch64 SVE looks like mov x12, #0x32 whilelo p0.b, xzr, x12 add x11, x11, #0x10 ld1b {z16.b}, p0/z, [x11] add x10, x10, #0x10 st1b {z16.b}, p0, [x10] We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array size arguments on a 512-bit SVE-featured CPU. We got below performance data changes. Benchmark (length) (Performance) ArrayCopyAligned.testByte 10 -2.6% ArrayCopyAligned.testByte 20 +4.7% ArrayCopyAligned.testByte 30 +4.8% ArrayCopyAligned.testByte 40 +21.7% ArrayCopyAligned.testByte 50 +22.5% ArrayCopyAligned.testByte 60 +28.4% The test machine has SVE vector size of 512 bits, so we see performance gain for most array sizes less than 64 bytes. For very small arrays we see a bit regression because a vector load/store may be a bit slower than 1 or 2 scalar loads/stores. ------------- Commit messages: - 8277168: AArch64: Enable arraycopy partial inlining with SVE Changes: https://git.openjdk.java.net/jdk/pull/6444/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6444&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277168 Stats: 87 lines in 16 files changed: 57 ins; 7 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/6444.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6444/head:pull/6444 PR: https://git.openjdk.java.net/jdk/pull/6444 From jpai at openjdk.java.net Thu Nov 18 04:17:43 2021 From: jpai at openjdk.java.net (Jaikiran Pai) Date: Thu, 18 Nov 2021 04:17:43 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks wrote: > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. src/java.base/share/classes/java/lang/ref/Finalizer.java line 195: > 193: > 194: static { > 195: if (Holder.ENABLED) { Hello Stuart, My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary? ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Thu Nov 18 05:22:35 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 05:22:35 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 04:13:21 GMT, Jaikiran Pai wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. > > src/java.base/share/classes/java/lang/ref/Finalizer.java line 195: > >> 193: >> 194: static { >> 195: if (Holder.ENABLED) { > > Hello Stuart, > My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary? Huh, good catch! This was mostly left over from an earlier version of the flag that used system properties, which aren't initialized until after the Finalizer class is initialized. It might be the case that the Holder can be removed at this point, since the finalization-enabled bit is no longer in a system property and is in a native class member that should be available before the VM is started. I say "might" though because this occurs early in system startup, and weird things potentially happen. For example, suppose the first object with a finalizer is created before the Finalizer class is initialized. The VM will perform an upcall to Finalizer::register. An ordinary call to a static method will ensure the class is initialized before proceeding with the call, but this VM upcall is a special case.... I'll have to investigate this some more. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Thu Nov 18 05:43:41 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 05:43:41 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn wrote: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure Hi Leonid, Something seems amiss to me. First the checks for `java_thread->threadObj() == NULL` should not be necessary as the `threadObj` can never be NULL once it has been started and a non-started thread should not be possible by the time you reach the code doing the checks. Even if we nulled out `threadObj` for a terminated thread the `is_exiting` check would already handle that case. Second, if the target thread is exiting then surely the suspension check should return false and so we would already give a JVMTI_ERROR_THREAD_NOT_SUSPENDED error? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From dholmes at openjdk.java.net Thu Nov 18 06:03:45 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 06:03:45 GMT Subject: RFR: 8276177: nsk/jvmti/RedefineClasses/StressRedefineWithoutBytecodeCorruption failed with "assert(def_ik->is_being_redefined()) failed: should be being redefined to get here" In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 13:29:08 GMT, Coleen Phillimore wrote: > The boolean AND was sharing a flag with another thread, so the value of _misc_is_being_redefined was being set and reset with the other thread. Moved the bit to AccessFlags which has space and an atomic set operation. > Tested with tier1-6, 7-8 in progress. src/hotspot/share/utilities/accessFlags.hpp line 165: > 163: bool is_being_redefined() const { return (_flags & JVM_ACC_IS_BEING_REDEFINED) != 0; } > 164: void set_is_being_redefined() { atomic_set_bits(JVM_ACC_IS_BEING_REDEFINED); } > 165: void clear_is_being_redefined() { atomic_clear_bits(JVM_ACC_IS_BEING_REDEFINED); } Shouldn't these have been under Klass flags, not Klass and Method flags ? ------------- PR: https://git.openjdk.java.net/jdk/pull/6410 From dholmes at openjdk.java.net Thu Nov 18 06:23:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 06:23:42 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2] In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> References: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> Message-ID: On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel wrote: >> Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. >> >> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > Add (Deprecated) to comments and add options to deprecated test Sorry for the delay - updates look good. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6390 From sspitsyn at openjdk.java.net Thu Nov 18 06:52:40 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 06:52:40 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn wrote: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure Martin and Leonid, thank you for quick review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From pli at openjdk.java.net Thu Nov 18 06:58:40 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Thu, 18 Nov 2021 06:58:40 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From sspitsyn at openjdk.java.net Thu Nov 18 07:00:47 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 07:00:47 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" In-Reply-To: References: Message-ID: <3pe7ADvZ3z_slXMHOU3g0kIrhLcsLi0xDIeqIAAmmsM=.27039eee-1c1d-405b-a948-a0bda9acd287@github.com> On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn wrote: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure Hi David, Thank you for reviewing this! I was also thinking about getting rid of the check `java_thread->threadObj() == NULL`. Then I've decided it is safe to keep it as it was in the original UpdateForPopTopFrameClosure implementation (but later in the code). I will remove it and retest the fix. > Second, if the target thread is exiting then surely the suspension check should return > false and so we would already give a JVMTI_ERROR_THREAD_NOT_SUSPENDED error? The assert ` assert(java_thread == _state->get_thread(), "Must be");` is fired one line before the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` code is returned. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From thartmann at openjdk.java.net Thu Nov 18 07:05:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 07:05:41 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann wrote: > Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 > > The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 > > Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? > > Thanks, > Tobias Thanks for the review, Sandhya! ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From sspitsyn at openjdk.java.net Thu Nov 18 07:08:15 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 07:08:15 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v2] In-Reply-To: References: Message-ID: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6440/files - new: https://git.openjdk.java.net/jdk/pull/6440/files/64f22944..60e784ec Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=00-01 Stats: 1850 lines in 27 files changed: 1576 ins; 95 del; 179 mod Patch: https://git.openjdk.java.net/jdk/pull/6440.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440 PR: https://git.openjdk.java.net/jdk/pull/6440 From dholmes at openjdk.java.net Thu Nov 18 07:16:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 07:16:42 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 05:20:02 GMT, Stuart Marks wrote: >> src/java.base/share/classes/java/lang/ref/Finalizer.java line 195: >> >>> 193: >>> 194: static { >>> 195: if (Holder.ENABLED) { >> >> Hello Stuart, >> My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary? > > Huh, good catch! This was mostly left over from an earlier version of the flag that used system properties, which aren't initialized until after the Finalizer class is initialized. > > It might be the case that the Holder can be removed at this point, since the finalization-enabled bit is no longer in a system property and is in a native class member that should be available before the VM is started. > > I say "might" though because this occurs early in system startup, and weird things potentially happen. For example, suppose the first object with a finalizer is created before the Finalizer class is initialized. The VM will perform an upcall to Finalizer::register. An ordinary call to a static method will ensure the class is initialized before proceeding with the call, but this VM upcall is a special case.... I'll have to investigate this some more. @stuart-marks not sure I see how anything is different here compared to the existing logic. The `Finalizer` class is explicitly initialized quite early in the init process, but if a preceding class's initialization created an object with a finalizer then that same upcall would be involved. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From kbarrett at openjdk.java.net Thu Nov 18 07:19:37 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 18 Nov 2021 07:19:37 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks wrote: > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. I only really reviewed the hotspot changes. There is nothing here to make the various GCs take advantage of finalization being disabled. Is the plan to leave that to followup changes? src/hotspot/share/oops/instanceKlass.hpp line 338: > 336: > 337: // Queries finalization state > 338: static bool finalization_enabled() { return _finalization_enabled; } Predicate functions like this are often named "is_xxx"; that idiom is common in this class. src/hotspot/share/prims/jvm.cpp line 694: > 692: > 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env)) > 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; missing indentation ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6442 From kbarrett at openjdk.java.net Thu Nov 18 07:19:37 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 18 Nov 2021 07:19:37 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> Message-ID: On Thu, 18 Nov 2021 06:43:01 GMT, Kim Barrett wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. > > src/hotspot/share/prims/jvm.cpp line 694: > >> 692: >> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env)) >> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; > > missing indentation I think this could just be `return InstanceKlass::finalization_enabled();`. There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From shade at openjdk.java.net Thu Nov 18 07:32:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Nov 2021 07:32:43 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 01:34:36 GMT, Stuart Marks wrote: > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >From the brief look, it is OK. Minor nits. src/hotspot/share/prims/jvm.cpp line 694: > 692: > 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env)) > 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; Suggestion: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From shade at openjdk.java.net Thu Nov 18 07:32:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Nov 2021 07:32:44 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:13:55 GMT, David Holmes wrote: >> Huh, good catch! This was mostly left over from an earlier version of the flag that used system properties, which aren't initialized until after the Finalizer class is initialized. >> >> It might be the case that the Holder can be removed at this point, since the finalization-enabled bit is no longer in a system property and is in a native class member that should be available before the VM is started. >> >> I say "might" though because this occurs early in system startup, and weird things potentially happen. For example, suppose the first object with a finalizer is created before the Finalizer class is initialized. The VM will perform an upcall to Finalizer::register. An ordinary call to a static method will ensure the class is initialized before proceeding with the call, but this VM upcall is a special case.... I'll have to investigate this some more. > > @stuart-marks not sure I see how anything is different here compared to the existing logic. The `Finalizer` class is explicitly initialized quite early in the init process, but if a preceding class's initialization created an object with a finalizer then that same upcall would be involved. Do we even have to have a flag on Java side? It looks like these calls are only done as the upcalls from VM, so we might just keep the flag on VM side? ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Thu Nov 18 07:37:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 07:37:42 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:08:15 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge > - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt > - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert Wouldn't it suffice to just move the assert then? ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From dholmes at openjdk.java.net Thu Nov 18 07:43:35 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 07:43:35 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:27:30 GMT, Aleksey Shipilev wrote: >> @stuart-marks not sure I see how anything is different here compared to the existing logic. The `Finalizer` class is explicitly initialized quite early in the init process, but if a preceding class's initialization created an object with a finalizer then that same upcall would be involved. > > Do we even have to have a flag on Java side? It looks like these calls are only done as the upcalls from VM, so we might just keep the flag on VM side? @shipilev not sure what you mean by "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From shade at openjdk.java.net Thu Nov 18 07:46:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Nov 2021 07:46:38 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:40:34 GMT, David Holmes wrote: >> Do we even have to have a flag on Java side? It looks like these calls are only done as the upcalls from VM, so we might just keep the flag on VM side? > > @shipilev not sure what you mean by "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things. Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there? ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Thu Nov 18 07:58:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 07:58:39 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> Message-ID: On Thu, 18 Nov 2021 07:16:56 GMT, Kim Barrett wrote: > There is nothing here to make the various GCs take advantage of finalization being disabled. Is the plan to leave that to followup changes? @kimbarrett I provided the basic VM parts here. I'm not aware of what specifically a GC might optimise if it knows there can be no finalizers, but that seems like something the GC folk should look to providing as a follow up. Thanks. > src/hotspot/share/oops/instanceKlass.hpp line 338: > >> 336: >> 337: // Queries finalization state >> 338: static bool finalization_enabled() { return _finalization_enabled; } > > Predicate functions like this are often named "is_xxx"; that idiom is common in this class. This was intended as an accessor function, similar to `count()` or `offset()` not a query as-in `is_shared_boot_class()`. As it is a boolean field you could convert it to a query instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Thu Nov 18 07:58:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 07:58:39 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:44:05 GMT, Aleksey Shipilev wrote: >> @shipilev not sure what you mean by "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things. > > Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there? `registerFinalizer` does not expect to be called and only uses the "flag" as a form of assertion. `runFinalization` is called from Java code. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Thu Nov 18 08:04:07 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 08:04:07 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v2] In-Reply-To: References: Message-ID: > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: Include instanceKlass.hpp in arguments.cpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6442/files - new: https://git.openjdk.java.net/jdk/pull/6442/files/3836cc94..911af0b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442 PR: https://git.openjdk.java.net/jdk/pull/6442 From sspitsyn at openjdk.java.net Thu Nov 18 08:29:41 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 08:29:41 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:08:15 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge > - remove vmTestbase/nsk/jvmti/PopFrame/popframe011 from ProblemList.txt > - fix 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with assert It does not look right to check other conditions if the JavaThread is exiting. So, I think, the `java_thread->is_exiting()` has to be checked first. Please, let me know if I miss anything. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From alanb at openjdk.java.net Thu Nov 18 08:42:40 2021 From: alanb at openjdk.java.net (Alan Bateman) Date: Thu, 18 Nov 2021 08:42:40 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:44:05 GMT, Aleksey Shipilev wrote: >> @shipilev not sure what you mean by "a flag on the Java side". The Java code just queries the VM for the finalization enabled/disabled state and uses that to control things. > > Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there? I think @shipilev asks a good question. This could be done completely in the VM without the changes to j.l.ref.Finalizer. The CLI option is for experimenting, at least in the short term, and should be benign to have the Finalizer thread running, it just won't do anything. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From thartmann at openjdk.java.net Thu Nov 18 09:25:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 18 Nov 2021 09:25:38 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund wrote: > When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. > The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6405 From sspitsyn at openjdk.java.net Thu Nov 18 09:34:13 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 09:34:13 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6440/files - new: https://git.openjdk.java.net/jdk/pull/6440/files/60e784ec..435ab513 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6440&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6440.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6440/head:pull/6440 PR: https://git.openjdk.java.net/jdk/pull/6440 From aph at openjdk.java.net Thu Nov 18 09:36:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 09:36:37 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. I'll have a look. It'll take me a little time to provision a suitable SVE-enabled AArch64 box. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From dholmes at openjdk.java.net Thu Nov 18 09:58:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 09:58:39 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL IIUC these cases all require that the target is suspended else it is an error. If the target is_exiting then it is not suspended and therefore there is an error. The suspension check should already handle an exiting thread and so there is no need to explicitly add an is_exiting check. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From stefank at openjdk.java.net Thu Nov 18 10:03:00 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 18 Nov 2021 10:03:00 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches Message-ID: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> We got a report on the zgc-dev list about a large performance issue affecting ZGC: https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms and while this were happening we got a huge number of ICBufferFull safepoints. It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 CompiledIC *ic = CompiledIC_at(iter.reloc()); oop ic_oop = ic->cached_oop(); if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { // The only exception is compiledICHolder oops which may // yet be marked below. (We check this further below). if (ic_oop->is_compiledICHolder()) { compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); if (is_alive->do_object_b( cichk_oop->holder_method()->method_holder()) && is_alive->do_object_b(cichk_oop->holder_klass())) { continue; } } ic->set_to_clean(); assert(ic->cached_oop() == NULL, "cached oop in IC should be cleared"); } } The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: CompiledIC *ic = CompiledIC_at(iter.reloc()); if (ic->is_icholder_call()) { // The only exception is compiledICHolder oops which may // yet be marked below. (We check this further below). CompiledICHolder* cichk_oop = ic->cached_icholder(); if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && cichk_oop->holder_klass()->is_loader_alive(is_alive)) { continue; } } else { Metadata* ic_oop = ic->cached_metadata(); if (ic_oop != NULL) { if (ic_oop->is_klass()) { if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { continue; } } else if (ic_oop->is_method()) { if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { continue; } } else { ShouldNotReachHere(); } } } ic->set_to_clean(); } Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. To understand why this is causing the problems we are seeing it's good to start by reading: https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in th e GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. I've tested run the patch through tier1-7. Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. ------------- Commit messages: - Minimize - Rewrite - Fix guarded by flags Changes: https://git.openjdk.java.net/jdk/pull/6450/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6450&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277212 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6450.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6450/head:pull/6450 PR: https://git.openjdk.java.net/jdk/pull/6450 From simonis at openjdk.java.net Thu Nov 18 10:21:01 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 18 Nov 2021 10:21:01 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v10] In-Reply-To: References: Message-ID: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions - Fix build issue for minimal/zero build one more time - Minor enhancements and fixes requested by Martin - Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. - Fix build issue for minimal/zero build - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters - Fix special case where we're creating an implicit exception for a regular invoke* bytecode - Minor updates as requested by @TheRealMDoerr - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow ------------- Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=09 Stats: 793 lines in 18 files changed: 778 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From eosterlund at openjdk.java.net Thu Nov 18 10:32:50 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 10:32:50 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson wrote: > We got a report on the zgc-dev list about a large performance issue affecting ZGC: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html > > One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: > > [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms > [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms > > and while this were happening we got a huge number of ICBufferFull safepoints. > > It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: > > https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > oop ic_oop = ic->cached_oop(); > if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > if (ic_oop->is_compiledICHolder()) { > compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); > if (is_alive->do_object_b( > cichk_oop->holder_method()->method_holder()) && > is_alive->do_object_b(cichk_oop->holder_klass())) { > continue; > } > } > ic->set_to_clean(); > assert(ic->cached_oop() == NULL, > "cached oop in IC should be cleared"); > } > } > > > The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > if (ic->is_icholder_call()) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > CompiledICHolder* cichk_oop = ic->cached_icholder(); > if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && > cichk_oop->holder_klass()->is_loader_alive(is_alive)) { > continue; > } > } else { > Metadata* ic_oop = ic->cached_metadata(); > if (ic_oop != NULL) { > if (ic_oop->is_klass()) { > if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { > continue; > } > } else if (ic_oop->is_method()) { > if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { > continue; > } > } else { > ShouldNotReachHere(); > } > } > } > ic->set_to_clean(); > } > > > Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. > > To understand why this is causing the problems we are seeing it's good to start by reading: > https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall > > When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). > > But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. > > But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. > > G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. > > I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html > > I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). > > I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. > > To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. > > Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: > > [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns > [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns > [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns > [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns > [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns > [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns > [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns > [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns > [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns > [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns > [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns > [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns > [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns > [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns > [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns > [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns > [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns > [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms > > > Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: > > [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms > [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s > [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets > [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns > [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns > [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns > [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns > [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms > > > I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. > > I've tested run the patch through tier1-7. > > Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6450 From sspitsyn at openjdk.java.net Thu Nov 18 10:34:40 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 10:34:40 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: <_JMj789jxQAfiksYaaXNDkVxOyYr3bomNH_oUGDaSIk=.6e2435cc-78e5-4306-bfd7-d45f3766e51e@github.com> On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL It is not correct. At least, there is this case: /* non suspended and exiting thread */ case 6: set_watch_ev(1); /* watch JVMTI events */ popframe_err = (jvmti->PopFrame(frameThr)); /* explode the bomb */ set_watch_ev(0); /* ignore again JVMTI events */ if (popframe_err != JVMTI_ERROR_THREAD_NOT_SUSPENDED && popframe_err != JVMTI_ERROR_THREAD_NOT_ALIVE) { printf("TEST FAILED: the function PopFrame() returned the error %d: %s\n", popframe_err, TranslateError(popframe_err)); printf("\tBut it should return the error JVMTI_ERROR_THREAD_NOT_SUSPENDED or JVMTI_ERROR_THREAD_NOT_ALIVE.\n"); return STATUS_FAILED; } break; } In other cases, the test constructs cases so that the tested thread is alive when expected. The test was easily failing before in 10th of runs but now it does not fail in 100 runs. I'll try to run this test 1000 times on all platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From sspitsyn at openjdk.java.net Thu Nov 18 10:38:40 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 18 Nov 2021 10:38:40 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: <-6jHqTZU-MyvUMDaH_H7GFwBEP84d7IV2vrgLjS2n3w=.fa71006c-959e-4866-be9b-4de8c6525b6f@github.com> On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Also, if the target thread is exiting then the PopFrame should return error code `JVMTI_ERROR_THREAD_NOT_ALIVE`, but not `JVMTI_ERROR_THREAD_NOT_SUSPENDED`. It does not matter what this test is expecting. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From pliden at openjdk.java.net Thu Nov 18 10:48:46 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 18 Nov 2021 10:48:46 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: <9fmEJXfA_BDBUHlUS8P9XA6ZlwXxGNGggnDdP_0wvKs=.4e471ca9-d6e4-473d-b66b-338dabc1f528@github.com> On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson wrote: > We got a report on the zgc-dev list about a large performance issue affecting ZGC: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html > > One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: > > [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms > [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms > > and while this were happening we got a huge number of ICBufferFull safepoints. > > It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: > > https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > oop ic_oop = ic->cached_oop(); > if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > if (ic_oop->is_compiledICHolder()) { > compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); > if (is_alive->do_object_b( > cichk_oop->holder_method()->method_holder()) && > is_alive->do_object_b(cichk_oop->holder_klass())) { > continue; > } > } > ic->set_to_clean(); > assert(ic->cached_oop() == NULL, > "cached oop in IC should be cleared"); > } > } > > > The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > if (ic->is_icholder_call()) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > CompiledICHolder* cichk_oop = ic->cached_icholder(); > if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && > cichk_oop->holder_klass()->is_loader_alive(is_alive)) { > continue; > } > } else { > Metadata* ic_oop = ic->cached_metadata(); > if (ic_oop != NULL) { > if (ic_oop->is_klass()) { > if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { > continue; > } > } else if (ic_oop->is_method()) { > if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { > continue; > } > } else { > ShouldNotReachHere(); > } > } > } > ic->set_to_clean(); > } > > > Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. > > To understand why this is causing the problems we are seeing it's good to start by reading: > https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall > > When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). > > But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. > > But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. > > G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. > > I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html > > I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). > > I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. > > To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. > > Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: > > [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns > [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns > [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns > [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns > [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns > [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns > [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns > [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns > [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns > [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns > [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns > [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns > [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns > [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns > [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns > [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns > [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns > [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms > > > Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: > > [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms > [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s > [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets > [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns > [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns > [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns > [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns > [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms > > > I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. > > I've tested run the patch through tier1-7. > > Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. Looks good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6450 From eosterlund at openjdk.java.net Thu Nov 18 11:20:45 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 11:20:45 GMT Subject: RFR: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:22:40 GMT, Tobias Hartmann wrote: >> When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. >> The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. > > Looks good. Thanks for the reviews, @TobiHartmann and @dean-long. ------------- PR: https://git.openjdk.java.net/jdk/pull/6405 From eosterlund at openjdk.java.net Thu Nov 18 11:20:46 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 11:20:46 GMT Subject: Integrated: 8266368: Inaccurate after_unwind hook in C2 exception handler In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 08:42:32 GMT, Erik ?sterlund wrote: > When we throw an exception and unwind into a frame, the exception handler of that frame needs to call an after_unwind hook for the StackWatermark code, to support for concurrent stack processing. Unfortunately, for C2 frames, I inaccurately do this in OptoRuntime::rethrow_C, but the exception handler when unwinding into a C2 frame really is OptoRuntime::handle_exception_C. > The handle_exception_C code does walk frames to the caller though, which also pokes the StackWatermark code. So in the end, there is no real bug here, but it works for the wrong reasons. So I'd like to move the hook in rethrow_C to handle_exception_C. This pull request has now been integrated. Changeset: 2c06bca9 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/2c06bca98fcf9d129d6085e26c225fb26368a558 Stats: 12 lines in 2 files changed: 5 ins; 5 del; 2 mod 8266368: Inaccurate after_unwind hook in C2 exception handler Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6405 From duke at openjdk.java.net Thu Nov 18 11:21:54 2021 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 18 Nov 2021 11:21:54 GMT Subject: Integrated: 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 In-Reply-To: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> References: <-I6bN1jTD9JWI_Nsrf64Yy25zpmeF7Q6Vpe5vSLPKY8=.33084f60-ebd4-4f8f-b56e-408f0fd807b6@github.com> Message-ID: <2f6H9Qd974dKFthSWKyO4AEUME2TGMeJwBZKrCORGUc=.b22c6589-c758-427d-bc3e-d1e84185a38f@github.com> On Tue, 16 Nov 2021 18:14:15 GMT, Evgeny Astigeevich wrote: > One `ISB` implementation of `Thread.OnSpinWait` shows performance improvements on Graviton2 (Arm Neoverse N1 implementation), e.g. https://github.com/openjdk/jdk/pull/5562#issuecomment-966153163. > > Testing: > - `make test TEST=gtest`: Passed > - `make run-test TEST=tier1`: Passed > - `make run-test TEST=tier2`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed This pull request has now been integrated. Changeset: 38345bd2 Author: Evgeny Astigeevich Committer: Volker Simonis URL: https://git.openjdk.java.net/jdk/commit/38345bd28db83371676f1685806ddc207a833879 Stats: 105 lines in 2 files changed: 105 ins; 0 del; 0 mod 8277137: Set OnSpinWaitInst/OnSpinWaitInstCount defaults to "isb"/1 for Arm Neoverse N1 Reviewed-by: phh, aph, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/6415 From dholmes at openjdk.java.net Thu Nov 18 12:56:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 12:56:42 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL It seems somewhat subjective whether a thread that is exiting and thus still on its way to becoming "not alive" needs to report "not alive" versus "not suspended". As there appears to be no synchronization with the target in this case what stops it from transitioning to "is_exiting" the moment after the "is_exiting" check returns false, but before you hit the assertion? ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From dholmes at openjdk.java.net Thu Nov 18 13:11:37 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 13:11:37 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Ignore that last question - the target is in a handshake so can't change state. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From stefank at openjdk.java.net Thu Nov 18 13:16:53 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 18 Nov 2021 13:16:53 GMT Subject: RFR: 8277397: ZGC: Add JFR event for temporary latency measurements Message-ID: I often measure latencies and stalls using JFR events. I'd like to add an event that can be used for these ad-hoc measurements during development and debugging. ------------- Commit messages: - 8277397: ZGC: Add JFR event for temporary latency measurements Changes: https://git.openjdk.java.net/jdk/pull/6454/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6454&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277397 Stats: 55 lines in 6 files changed: 55 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6454.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6454/head:pull/6454 PR: https://git.openjdk.java.net/jdk/pull/6454 From hseigel at openjdk.java.net Thu Nov 18 13:22:48 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 18 Nov 2021 13:22:48 GMT Subject: RFR: 8276795: Deprecate seldom used CDS flags [v2] In-Reply-To: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> References: <2R0k3TgJwgMkaV2tlOyW8O1cLiB6USFPJ-qvItVBJV0=.7e012487-abf4-4fa3-91c8-74f01d49bbab@github.com> Message-ID: <7239ZzwFhI50QRFnA_pYj26UEQH2mzlltN5Ve4L-sWw=.32fbd078-7628-4632-876b-24d2aaa9cce3@github.com> On Tue, 16 Nov 2021 15:56:00 GMT, Harold Seigel wrote: >> Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. >> >> The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > Add (Deprecated) to comments and add options to deprecated test Thanks Calvin, Ioi, and David for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/6390 From hseigel at openjdk.java.net Thu Nov 18 13:22:48 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 18 Nov 2021 13:22:48 GMT Subject: Integrated: 8276795: Deprecate seldom used CDS flags In-Reply-To: References: Message-ID: On Mon, 15 Nov 2021 14:50:43 GMT, Harold Seigel wrote: > Please review this small change to deprecate seldom used CDS flags. The flags will be deprecated in 18, obsoleted in 19, and removed in a later release. > > The changes were tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux x64. > > Thanks, Harold This pull request has now been integrated. Changeset: b3a62b48 Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/b3a62b48816358ac7dadde4e7893190500ca7b79 Stats: 17 lines in 4 files changed: 8 ins; 0 del; 9 mod 8276795: Deprecate seldom used CDS flags Reviewed-by: dholmes, ccheung, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/6390 From simonis at openjdk.java.net Thu Nov 18 13:37:49 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 18 Nov 2021 13:37:49 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 11 Nov 2021 06:30:15 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge > - Let NMT do overflow detection Hi Thoms, your change looks good. I only have a few remarks and comments inline. Best regards, Volker src/hotspot/share/services/mallocTracker.cpp line 138: > 136: os::print_hex_dump(st, from, to, 1); > 137: assert(bad_address >= from, "sanity"); > 138: // if the corruption is in the block body of in the footer, print out that part too // If the ... body or in the footer... src/hotspot/share/services/mallocTracker.cpp line 143: > 141: from2 = MAX2(to, from2); > 142: address to2 = from2 + 96; > 143: if (to2 > to) { Don't understand this. If `from2 = MAX2(to, from2)` then `from2 >= to`. So shouldn't `to2` (which is `from2 + 96`) always be bigger then `to`? src/hotspot/share/services/mallocTracker.cpp line 169: > 167: // use SafeFetch but since this is a hot path we don't. If we are > 168: // wrong, we will crash when accessing the canary, which hopefully > 169: // generates distinct crash report. No need for two spaces after `//` src/hotspot/share/services/mallocTracker.cpp line 174: > 172: // we check here are the bare minimum of what we know will malloc() give us > 173: // (which is 64-bit even on 32-bit platforms). > 174: if (!is_aligned(this, sizeof(uint64_t))) { Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions: > "malloc returns a pointer which is suitably aligned for any built-in type" Why is this 64 bit on a 32-bit platform? src/hotspot/share/services/mallocTracker.hpp line 314: > 312: static const uint8_t _footer_canary_dead_mark = 0xFB; > 313: NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;) > 314: NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;) Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values? test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 71: > 69: // this should generate two hex dumps, one with the front header, one with the overwritten > 70: // portion. > 71: static void test_overwrite_back_long() { I think the test isn't really checking that we get two hex dumps, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From aph at openjdk.java.net Thu Nov 18 13:56:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 13:56:38 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. I'm having a lot of difficulty understanding how this is supposed to work. Firstly, I'm not seeing a performance increase on a fujitsu-fx700. Secondly, I'm not surprised: looking at the results of JMH `-prof:perfasm`, it seems to me that the only SVE instructions being executed are _outside_ the timing loop in the `testByte_ArrayCopyAligned_testByte_jmhTest:avgt_jmhStub` method. I'm baffled by what is going on. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From stuefe at openjdk.java.net Thu Nov 18 13:58:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Nov 2021 13:58:40 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 12:12:17 GMT, Volker Simonis wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - Let NMT do overflow detection > > src/hotspot/share/services/mallocTracker.cpp line 143: > >> 141: from2 = MAX2(to, from2); >> 142: address to2 = from2 + 96; >> 143: if (to2 > to) { > > Don't understand this. If `from2 = MAX2(to, from2)` then `from2 >= to`. So shouldn't `to2` (which is `from2 + 96`) always be bigger then `to`? You are absolutely right, and the code is not very clear either, I'll improve it. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Thu Nov 18 14:22:46 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Nov 2021 14:22:46 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 12:28:55 GMT, Volker Simonis wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - Let NMT do overflow detection > > src/hotspot/share/services/mallocTracker.cpp line 174: > >> 172: // we check here are the bare minimum of what we know will malloc() give us >> 173: // (which is 64-bit even on 32-bit platforms). >> 174: if (!is_aligned(this, sizeof(uint64_t))) { > > Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions: > >> "malloc returns a pointer which is suitably aligned for any built-in type" > > Why is this 64 bit on a 32-bit platform? We know that the alignment has to be *at least* 64-bit since we know we have 64-bit inbuilt types on both 64-bit and 32-bit platforms (`uint64_t`). From experience, I know it is probably more, 16 or 32 bytes. This makes sense since there exist scalar data types larger than 64-bit. But this code tests the *minimal* necessary alignment only and I wanted to prevent false positives. Let's say, in case we happen to run with a libc whose malloc implementation returns only 64-bit aligned pointers. That also could happen if someone put a weird malloc() implementation below us (malloc hooks or LD_PRELOAD). I think the assumption that everything malloc() returns is at least 64-bit aligned is pretty safe though. Ideally, we would have a clear definition of malloc alignment somewhere in `globalDefinitions.hpp`. In hotspot, there are a range of places where such alignment is implicitly assumed. The NMT header size, for instance, or metaspace allocation size, hotspot arena allocation alignment etc. Basically, everywhere where one either marshalls malloc'ed blocks or implements some sort of general purpose allocator. C++ has `std::max_align_t` for that. Maybe we could use that one. But that's a topic for another RFE. I try to improve the comment. > src/hotspot/share/services/mallocTracker.hpp line 314: > >> 312: static const uint8_t _footer_canary_dead_mark = 0xFB; >> 313: NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;) >> 314: NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;) > > Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values? I did an extensive statistical analysis of many core dumps. ... ... Just kidding, I chose them on a whim to be not zero :) Do you have a better suggestion? I thought about making them ASCII pattern, but those are actually more common in payload data. > test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 71: > >> 69: // this should generate two hex dumps, one with the front header, one with the overwritten >> 70: // portion. >> 71: static void test_overwrite_back_long() { > > I think the test isn't really checking that we get two hex dumps, right? Again, bad wording in the comment. This tests that `MallocHeader::print_block_on_error()` prints a hex dump covering both header and the corruption address, and if both are too far apart, that the dump is split up in two parts. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From zgu at openjdk.java.net Thu Nov 18 14:30:44 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Nov 2021 14:30:44 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 14:14:17 GMT, Thomas Stuefe wrote: >> src/hotspot/share/services/mallocTracker.cpp line 174: >> >>> 172: // we check here are the bare minimum of what we know will malloc() give us >>> 173: // (which is 64-bit even on 32-bit platforms). >>> 174: if (!is_aligned(this, sizeof(uint64_t))) { >> >> Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions: >> >>> "malloc returns a pointer which is suitably aligned for any built-in type" >> >> Why is this 64 bit on a 32-bit platform? > > We know that the alignment has to be *at least* 64-bit since we know we have 64-bit inbuilt types on both 64-bit and 32-bit platforms (`uint64_t`). From experience, I know it is probably more, 16 or 32 bytes. This makes sense since there exist scalar data types larger than 64-bit. > > But this code tests the *minimal* necessary alignment only and I wanted to prevent false positives. Let's say, in case we happen to run with a libc whose malloc implementation returns only 64-bit aligned pointers. That also could happen if someone put a weird malloc() implementation below us (malloc hooks or LD_PRELOAD). I think the assumption that everything malloc() returns is at least 64-bit aligned is pretty safe though. > > Ideally, we would have a clear definition of malloc alignment somewhere in `globalDefinitions.hpp`. In hotspot, there are a range of places where such alignment is implicitly assumed. The NMT header size, for instance, or metaspace allocation size, hotspot arena allocation alignment etc. Basically, everywhere where one either marshalls malloc'ed blocks or implements some sort of general purpose allocator. C++ has `std::max_align_t` for that. Maybe we could use that one. But that's a topic for another RFE. > > I try to improve the comment. > Where does this information come from? As far as I can see, the man-page of `malloc()` only mentions: > > > "malloc returns a pointer which is suitably aligned for any built-in type" > > Why is this 64 bit on a 32-bit platform? NMT always assumes (from experiments on various platforms) that malloc memory is 2-machine-word aligned, so it is 64-bit align on a 32-bit platform. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From eosterlund at openjdk.java.net Thu Nov 18 14:36:56 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 14:36:56 GMT Subject: Integrated: 8259643: ZGC can return metaspace OOM prematurely In-Reply-To: References: Message-ID: <6xrNisEJhIfc0YwEy-5Z-xTCeHLjBYJK8X4jei5NtIU=.3ef261e6-3b73-4ca8-a9a7-bc9fff74b9ea@github.com> On Thu, 28 Jan 2021 12:55:55 GMT, Erik ?sterlund wrote: > There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: > > 1. full_gc() > 2. final_allocation_attempt() > > And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. > > The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. > > The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. > > Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). This pull request has now been integrated. Changeset: 00c388b4 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/00c388b4aba41d5f0874585e9c0a33c4571805f6 Stats: 297 lines in 6 files changed: 276 ins; 17 del; 4 mod 8259643: ZGC can return metaspace OOM prematurely Reviewed-by: stefank, pliden, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From coleenp at openjdk.java.net Thu Nov 18 14:43:40 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Nov 2021 14:43:40 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson wrote: > We got a report on the zgc-dev list about a large performance issue affecting ZGC: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html > > One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: > > [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms > [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms > > and while this were happening we got a huge number of ICBufferFull safepoints. > > It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: > > https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > oop ic_oop = ic->cached_oop(); > if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > if (ic_oop->is_compiledICHolder()) { > compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); > if (is_alive->do_object_b( > cichk_oop->holder_method()->method_holder()) && > is_alive->do_object_b(cichk_oop->holder_klass())) { > continue; > } > } > ic->set_to_clean(); > assert(ic->cached_oop() == NULL, > "cached oop in IC should be cleared"); > } > } > > > The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > if (ic->is_icholder_call()) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > CompiledICHolder* cichk_oop = ic->cached_icholder(); > if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && > cichk_oop->holder_klass()->is_loader_alive(is_alive)) { > continue; > } > } else { > Metadata* ic_oop = ic->cached_metadata(); > if (ic_oop != NULL) { > if (ic_oop->is_klass()) { > if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { > continue; > } > } else if (ic_oop->is_method()) { > if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { > continue; > } > } else { > ShouldNotReachHere(); > } > } > } > ic->set_to_clean(); > } > > > Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. > > To understand why this is causing the problems we are seeing it's good to start by reading: > https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall > > When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). > > But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. > > But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. > > G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. > > I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html > > I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). > > I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. > > To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. > > Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: > > [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns > [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns > [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns > [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns > [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns > [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns > [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns > [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns > [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns > [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns > [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns > [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns > [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns > [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns > [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns > [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns > [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns > [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms > > > Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: > > [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms > [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s > [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets > [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns > [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns > [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns > [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns > [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms > > > I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. > > I've tested run the patch through tier1-7. > > Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. Great job tracking this down! It does look like it was a merge error from the original code that's escaped notice until now. Well done! src/hotspot/share/code/compiledMethod.cpp line 482: > 480: } > 481: } else { > 482: return true; I've given up pretending to understand this code, but could you add a one line comment why you're returning true here? ie. if ic_metadata is NULL, it's a megamorphic call or already clean and shouldn't be cleaned. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6450 From plevart at openjdk.java.net Thu Nov 18 14:56:50 2021 From: plevart at openjdk.java.net (Peter Levart) Date: Thu, 18 Nov 2021 14:56:50 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 08:39:52 GMT, Alan Bateman wrote: >> Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there? > > I think @shipilev asks a good question. This could be done completely in the VM without the changes to j.l.ref.Finalizer. The CLI option is for experimenting, at least in the short term, and should be benign to have the Finalizer thread running, it just won't do anything. Or, you could move the static initialization block that statrts the finalizer thread into the Finalizer.FinalizerThread class itself and then arrange for that class to be initialized explicitly immediately after the Finalizer class, but conditionally, only if the option to disable finalization was not specified... This way the Finalizer class could still be initialized early, but the thread would not be started if it is not needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From plevart at openjdk.java.net Thu Nov 18 15:08:45 2021 From: plevart at openjdk.java.net (Peter Levart) Date: Thu, 18 Nov 2021 15:08:45 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 14:53:38 GMT, Peter Levart wrote: >> I think @shipilev asks a good question. This could be done completely in the VM without the changes to j.l.ref.Finalizer. The CLI option is for experimenting, at least in the short term, and should be benign to have the Finalizer thread running, it just won't do anything. > > Or, you could move the static initialization block that statrts the finalizer thread into the Finalizer.FinalizerThread class itself and then arrange for that class to be initialized explicitly immediately after the Finalizer class, but conditionally, only if the option to disable finalization was not specified... > This way the Finalizer class could still be initialized early, but the thread would not be started if it is not needed. If you then need this "flag" in the assert of registerFinalizer and runFinalization, you could use unsafe.shouldBeInitialized(Finalizer.FinalizerThread.class) as a means to find out whether the flag was set or not... ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From stuefe at openjdk.java.net Thu Nov 18 15:25:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Nov 2021 15:25:15 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3] In-Reply-To: References: Message-ID: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. > > This proposal adds NMT buffer overflow checking: > > - it gives us C-heap overflow checking in release builds > - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. > - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. > > For more details, please see the JBS issue. > > ---- > > Patch notes: > > - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. > - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. > - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. > > - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. > > - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. > > - I made the assert for malloc site table width a compile time STATIC_ASSERT. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 4 weeks now without problems Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback Volker ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5952/files - new: https://git.openjdk.java.net/jdk/pull/5952/files/e04a105d..a1611e78 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=01-02 Stats: 42 lines in 2 files changed: 19 ins; 5 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/5952.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952 PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Thu Nov 18 15:25:20 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Nov 2021 15:25:20 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 13:34:11 GMT, Volker Simonis wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge >> - Let NMT do overflow detection > > Hi Thoms, > > your change looks good. I only have a few remarks and comments inline. > > Best regards, > Volker Thanks a lot @simonis for the review! I massaged the patch a bit, improving comments and rewriting the block print function. I think its now easier to understand. I also extended the dump size somewhat, now it looks like this: corruption and header close together: NMT Block at 0x00005571a7030330, corruption at: 0x00005571a7030341: 0x00005571a70302b0: 30 1c 05 a7 71 55 00 00 60 e5 f3 31 fa 7f 00 00 0x00005571a70302c0: 47 e6 f3 31 fa 7f 00 00 11 e6 f3 31 fa 7f 00 00 0x00005571a70302d0: 84 e6 f3 31 fa 7f 00 00 f1 f1 f1 f1 f1 f1 f1 f1 0x00005571a70302e0: 00 00 00 00 f1 f1 f1 f1 fa ab ab ab ab ab ab ab 0x00005571a70302f0: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x00005571a7030300: ab ba ba ba ba ba ba ba 51 00 00 00 00 00 00 00 0x00005571a7030310: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x00005571a7030320: 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x00005571a7030330: 01 00 00 00 00 00 00 00 f1 f1 f1 f1 0f 00 1f fa 0x00005571a7030340: f1 61 ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x00005571a7030350: ab ab 00 00 00 00 00 00 61 00 00 00 00 00 00 00 0x00005571a7030360: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x00005571a7030370: 21 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x00005571a7030380: 10 00 00 00 00 00 00 00 f1 f1 f1 f1 0b 00 1f fa 0x00005571a7030390: 00 00 00 00 00 00 00 00 d8 a5 01 c8 f9 7f 00 00 0x00005571a70303a0: fa ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x00005571a70303b0: ab ba ba ba ba ba ba ba 61 00 00 00 00 00 00 00 Corruption and header apart: NMT Block at 0x0000564da16bee10, corruption at: 0x0000564da16c0e20: 0x0000564da16bed90: 00 00 00 00 00 00 00 00 00 ba ba ba ba ba ba ba 0x0000564da16beda0: ba ba ba ba ba ba ba ba 01 01 ba ba 00 00 00 00 0x0000564da16bedb0: 01 00 00 00 00 ba ba ba 64 00 00 00 ba ba ba ba 0x0000564da16bedc0: d0 ed 6b a1 4d 56 00 00 00 00 00 00 00 00 00 00 0x0000564da16bedd0: 00 ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x0000564da16bede0: 00 ba ba ba ba ba ba ba 51 20 00 00 00 00 00 00 0x0000564da16bedf0: ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x0000564da16bee00: 11 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0000564da16bee10: 00 20 00 00 00 00 00 00 f1 f1 f1 f1 0f 00 1f fa 0x0000564da16bee20: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16bee30: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16bee40: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16bee50: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16bee60: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16bee70: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16bee80: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 ... 0x0000564da16c0da0: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0db0: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0dc0: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0dd0: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0de0: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0df0: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0e00: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0e10: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 0x0000564da16c0e20: 61 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab 0x0000564da16c0e30: ab ba ba ba ba ba ba ba d1 61 01 00 00 00 00 00 0x0000564da16c0e40: ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x0000564da16c0e50: ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x0000564da16c0e60: ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x0000564da16c0e70: ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x0000564da16c0e80: ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x0000564da16c0e90: ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba Thanks! Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From stefank at openjdk.java.net Thu Nov 18 15:26:10 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 18 Nov 2021 15:26:10 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches [v2] In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: > We got a report on the zgc-dev list about a large performance issue affecting ZGC: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html > > One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: > > [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms > [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms > > and while this were happening we got a huge number of ICBufferFull safepoints. > > It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: > > https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > oop ic_oop = ic->cached_oop(); > if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > if (ic_oop->is_compiledICHolder()) { > compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); > if (is_alive->do_object_b( > cichk_oop->holder_method()->method_holder()) && > is_alive->do_object_b(cichk_oop->holder_klass())) { > continue; > } > } > ic->set_to_clean(); > assert(ic->cached_oop() == NULL, > "cached oop in IC should be cleared"); > } > } > > > The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > if (ic->is_icholder_call()) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > CompiledICHolder* cichk_oop = ic->cached_icholder(); > if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && > cichk_oop->holder_klass()->is_loader_alive(is_alive)) { > continue; > } > } else { > Metadata* ic_oop = ic->cached_metadata(); > if (ic_oop != NULL) { > if (ic_oop->is_klass()) { > if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { > continue; > } > } else if (ic_oop->is_method()) { > if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { > continue; > } > } else { > ShouldNotReachHere(); > } > } > } > ic->set_to_clean(); > } > > > Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. > > To understand why this is causing the problems we are seeing it's good to start by reading: > https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall > > When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). > > But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. > > But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. > > G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. > > I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html > > I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). > > I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. > > To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. > > Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: > > [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns > [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns > [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns > [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns > [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns > [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns > [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns > [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns > [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns > [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns > [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns > [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns > [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns > [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns > [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns > [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns > [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns > [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms > > > Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: > > [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms > [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s > [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets > [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns > [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns > [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns > [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns > [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms > > > I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. > > I've tested run the patch through tier1-7. > > Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Review Coleen ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6450/files - new: https://git.openjdk.java.net/jdk/pull/6450/files/8a0aae06..af72104a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6450&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6450&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6450.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6450/head:pull/6450 PR: https://git.openjdk.java.net/jdk/pull/6450 From stefank at openjdk.java.net Thu Nov 18 15:26:12 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 18 Nov 2021 15:26:12 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson wrote: > We got a report on the zgc-dev list about a large performance issue affecting ZGC: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html > > One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: > > [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms > [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms > > and while this were happening we got a huge number of ICBufferFull safepoints. > > It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: > > https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > oop ic_oop = ic->cached_oop(); > if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > if (ic_oop->is_compiledICHolder()) { > compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); > if (is_alive->do_object_b( > cichk_oop->holder_method()->method_holder()) && > is_alive->do_object_b(cichk_oop->holder_klass())) { > continue; > } > } > ic->set_to_clean(); > assert(ic->cached_oop() == NULL, > "cached oop in IC should be cleared"); > } > } > > > The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > if (ic->is_icholder_call()) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > CompiledICHolder* cichk_oop = ic->cached_icholder(); > if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && > cichk_oop->holder_klass()->is_loader_alive(is_alive)) { > continue; > } > } else { > Metadata* ic_oop = ic->cached_metadata(); > if (ic_oop != NULL) { > if (ic_oop->is_klass()) { > if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { > continue; > } > } else if (ic_oop->is_method()) { > if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { > continue; > } > } else { > ShouldNotReachHere(); > } > } > } > ic->set_to_clean(); > } > > > Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. > > To understand why this is causing the problems we are seeing it's good to start by reading: > https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall > > When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). > > But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. > > But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. > > G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. > > I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html > > I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). > > I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. > > To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. > > Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: > > [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns > [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns > [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns > [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns > [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns > [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns > [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns > [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns > [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns > [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns > [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns > [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns > [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns > [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns > [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns > [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns > [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns > [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms > > > Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: > > [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms > [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s > [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets > [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns > [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns > [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns > [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns > [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms > > > I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. > > I've tested run the patch through tier1-7. > > Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. I've updated the patch with a comment. Note that we perform a is_clean() check at the top of the function, so we know that the IC is not "clean" at the new return line. ------------- PR: https://git.openjdk.java.net/jdk/pull/6450 From eosterlund at openjdk.java.net Thu Nov 18 15:40:38 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 18 Nov 2021 15:40:38 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches [v2] In-Reply-To: References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson wrote: >> We got a report on the zgc-dev list about a large performance issue affecting ZGC: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html >> >> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: >> >> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms >> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms >> >> and while this were happening we got a huge number of ICBufferFull safepoints. >> >> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: >> >> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> oop ic_oop = ic->cached_oop(); >> if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> if (ic_oop->is_compiledICHolder()) { >> compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); >> if (is_alive->do_object_b( >> cichk_oop->holder_method()->method_holder()) && >> is_alive->do_object_b(cichk_oop->holder_klass())) { >> continue; >> } >> } >> ic->set_to_clean(); >> assert(ic->cached_oop() == NULL, >> "cached oop in IC should be cleared"); >> } >> } >> >> >> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> if (ic->is_icholder_call()) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> CompiledICHolder* cichk_oop = ic->cached_icholder(); >> if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && >> cichk_oop->holder_klass()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> Metadata* ic_oop = ic->cached_metadata(); >> if (ic_oop != NULL) { >> if (ic_oop->is_klass()) { >> if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { >> continue; >> } >> } else if (ic_oop->is_method()) { >> if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> ShouldNotReachHere(); >> } >> } >> } >> ic->set_to_clean(); >> } >> >> >> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. >> >> To understand why this is causing the problems we are seeing it's good to start by reading: >> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall >> >> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). >> >> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. >> >> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. >> >> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. >> >> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html >> >> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). >> >> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. >> >> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. >> >> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: >> >> [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns >> [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns >> [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns >> [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns >> [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns >> [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns >> [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns >> [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns >> [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns >> [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns >> [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns >> [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns >> [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns >> [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns >> [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns >> [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns >> [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns >> [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms >> >> >> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: >> >> [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms >> [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s >> [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets >> [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns >> [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns >> [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns >> [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns >> [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms >> >> >> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. >> >> I've tested run the patch through tier1-7. >> >> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review Coleen Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6450 From psandoz at openjdk.java.net Thu Nov 18 16:20:49 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 18 Nov 2021 16:20:49 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com> On Wed, 17 Nov 2021 22:54:06 GMT, Sandhya Viswanathan wrote: >> Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: >> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 >> >> The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: >> https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 >> >> Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? >> >> Thanks, >> Tobias > > Looks good to me. @sviswa7 @jatin-bhateja any thoughts on the other related FIXMEs brought up by Tobias? e.g. if (op == AND_NOT) { // FIXME: Support this in the JIT. that = that.lanewise(NOT); op = AND; ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From sviswanathan at openjdk.java.net Thu Nov 18 16:35:40 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 18 Nov 2021 16:35:40 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com> References: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com> Message-ID: On Thu, 18 Nov 2021 16:18:10 GMT, Paul Sandoz wrote: >> Looks good to me. > > @sviswa7 @jatin-bhateja any thoughts on the other related FIXMEs brought up by Tobias? e.g. > > > if (op == AND_NOT) { > // FIXME: Support this in the JIT. > that = that.lanewise(NOT); > op = AND; @PaulSandoz Those fixme notes are from John, pointing to us where further optimizations are possible and not related to correctness. I also looked at the vop2ideal, it now handles all the opcodes for the relevant data types (inegral/fp). ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From zgu at openjdk.java.net Thu Nov 18 16:45:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Nov 2021 16:45:41 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 15:25:15 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Volker Changes requested by zgu (Reviewer). src/hotspot/share/runtime/os.cpp line 750: > 748: return MemTracker::record_malloc(ptr, size, memflags, stack, level); > 749: #else > 750: if (memblock == NULL) { I think you also need to subtract malloc_footer_size when calculating memblock_size below. Otherwise, memcpy can overwrite the footer. I wonder should just consolidate malloc_header_size and malloc_footer_size to one malloc_overhead? I don't see them used separately. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From psandoz at openjdk.java.net Thu Nov 18 16:49:45 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 18 Nov 2021 16:49:45 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com> References: <_ixIA9gCfmBYP8l9ZK121Z52eAnTpWE8nMPFEmY8oTA=.97c5f64b-8571-4ecc-a05a-a0f0c468c471@github.com> Message-ID: <7weUFVLrZV9GXlxUfOyRekglxGisX-5jl6cxm0KoovY=.5e8165a9-7b9c-4493-abca-a339526eed8a@github.com> On Thu, 18 Nov 2021 16:18:10 GMT, Paul Sandoz wrote: >> Looks good to me. > > @sviswa7 @jatin-bhateja any thoughts on the other related FIXMEs brought up by Tobias? e.g. > > > if (op == AND_NOT) { > // FIXME: Support this in the JIT. > that = that.lanewise(NOT); > op = AND; > @PaulSandoz Those fixme notes are from John, pointing to us where further optimizations are possible and not related to correctness. I also looked at the vop2ideal, it now handles all the opcodes for the relevant data types (inegral/fp). Thanks, i also looked at `vop2ideal` and concluded the same. ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From aph at openjdk.java.net Thu Nov 18 16:55:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 16:55:43 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 13:53:11 GMT, Andrew Haley wrote: > I'm baffled by what is going on. Sorry, it looks like I managed to confuse myself. The top of the loop looks like: 10c B17: # out( B18 ) <- in( B27 ) Freq: 4.49963 10c # castLL of R2 10c sve_whilelo P0, zr, R2 # sve 110 sve_ldr V16, P0, [R0] # load vector predicated (sve) 114 sve_str [R1], P0, V16 # store vector predicated (sve) 118 B18: # out( B30 B19 ) <- in( B17 B28 B26 ) Freq: 8.99927 118 118 ldarb R10, [R23] # byte ! Field: volatile org/openjdk/jmh/runner/InfraControlL2.isDone ... and the bottom 1a0 cmp R2, #64 1a4 bls B17 # unsigned P=0.500000 C=-1.000000 1a8 B28: # out( B18 ) <- in( B27 ) Freq: 4.49963 1a8 CALL, runtime leaf nofp 0x0000ffff6d1058f8 jbyte_arraycopy No JVM State Info # 1b0 b B18 So only if the length is < 64 (i.e. 512 bits) do we branch back to B17 to do the `SVE WHILELO` to set the predicate. This is confusing only because the code has been rearranged so that the test for < 64 bytes is at the bottom of the loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From simonis at openjdk.java.net Thu Nov 18 17:09:42 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 18 Nov 2021 17:09:42 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 14:16:12 GMT, Thomas Stuefe wrote: >> src/hotspot/share/services/mallocTracker.hpp line 314: >> >>> 312: static const uint8_t _footer_canary_dead_mark = 0xFB; >>> 313: NOT_LP64(static const uint32_t _header_alt_canary_life_mark = 0xFAFA1F1F;) >>> 314: NOT_LP64(static const uint32_t _header_alt_canary_dead_mark = 0xFBFB1F1F;) >> >> Just out of interest, how did you choose these canary marks? Is there some evidence that they appear less frequently in real code/data than other values? > > I did an extensive statistical analysis of many core dumps. > > ... > > ... > > Just kidding, I chose them on a whim to be not zero :) Do you have a better suggestion? I thought about making them ASCII pattern, but those are actually more common in payload data. I was just thinking of the usual suspects like 0xcafebabe, 0xbaadbabe or 0xdeadbeef because that would simplify the detection of these markers in core dumps, hs_err files or during debugging. But I'm fine with whatever you choose :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From dcubed at openjdk.java.net Thu Nov 18 17:18:39 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 18 Nov 2021 17:18:39 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL src/hotspot/share/prims/jvmtiEnvBase.cpp line 1533: > 1531: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ > 1532: } > 1533: assert(java_thread == _state->get_thread(), "Must be"); This `assert()` is the site of the original test failure. I haven't yet looked at the locations of the other changes. The `is_exiting()` check is made under the protection of the `JvmtiThreadState_lock` so an unsuspended target thread that is exiting cannot reach the point where the `_state` is updated to clear the `JavaThread*` so we can't fail the `assert()` if the `is_exiting()` check has returned `false`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From nradomski at openjdk.java.net Thu Nov 18 17:22:28 2021 From: nradomski at openjdk.java.net (Niklas Radomski) Date: Thu, 18 Nov 2021 17:22:28 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le [v2] In-Reply-To: References: Message-ID: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com> > Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: Remove debug clobber code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6325/files - new: https://git.openjdk.java.net/jdk/pull/6325/files/1dec8885..c504b66d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6325&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6325&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6325.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6325/head:pull/6325 PR: https://git.openjdk.java.net/jdk/pull/6325 From dcubed at openjdk.java.net Thu Nov 18 17:26:38 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 18 Nov 2021 17:26:38 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL I don't see a reason for the change in `SetForceEarlyReturn::doit()`, but I'm okay with the other changes. src/hotspot/share/prims/jvmtiEnvBase.cpp line 1401: > 1399: if (!self) { > 1400: if (!java_thread->is_suspended()) { > 1401: _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED; I don't see an obvious reason for this `is_exiting()` check. src/hotspot/share/prims/jvmtiEnvBase.cpp line 1625: > 1623: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ > 1624: } > 1625: assert(_state->get_thread() == java_thread, "Must be"); The `assert()` on L1625 is subject to the same race as the original site. This `is_exiting()` check is made under the protection of the `JvmtiThreadState_lock` so it is sufficient to protect that `assert()`. ------------- Changes requested by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6440 From aph at openjdk.java.net Thu Nov 18 17:27:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Nov 2021 17:27:41 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. Hurrah! I have managed to duplicate your results. Old: Benchmark (length) Mode Cnt Score Error Units ArrayCopyAligned.testByte 40 avgt 5 23.332 ? 0.016 ns/op New: ArrayCopyAligned.testByte 40 avgt 5 18.092 ? 0.093 ns/op ... and in fact your result is much better than this suggests, because the bulk of the test is fetching all of the arguments to arraycopy, not actually copying the bytes. I get it now. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From coleenp at openjdk.java.net Thu Nov 18 17:34:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Nov 2021 17:34:42 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches [v2] In-Reply-To: References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson wrote: >> We got a report on the zgc-dev list about a large performance issue affecting ZGC: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html >> >> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: >> >> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms >> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms >> >> and while this were happening we got a huge number of ICBufferFull safepoints. >> >> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: >> >> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> oop ic_oop = ic->cached_oop(); >> if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> if (ic_oop->is_compiledICHolder()) { >> compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); >> if (is_alive->do_object_b( >> cichk_oop->holder_method()->method_holder()) && >> is_alive->do_object_b(cichk_oop->holder_klass())) { >> continue; >> } >> } >> ic->set_to_clean(); >> assert(ic->cached_oop() == NULL, >> "cached oop in IC should be cleared"); >> } >> } >> >> >> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> if (ic->is_icholder_call()) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> CompiledICHolder* cichk_oop = ic->cached_icholder(); >> if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && >> cichk_oop->holder_klass()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> Metadata* ic_oop = ic->cached_metadata(); >> if (ic_oop != NULL) { >> if (ic_oop->is_klass()) { >> if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { >> continue; >> } >> } else if (ic_oop->is_method()) { >> if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> ShouldNotReachHere(); >> } >> } >> } >> ic->set_to_clean(); >> } >> >> >> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. >> >> To understand why this is causing the problems we are seeing it's good to start by reading: >> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall >> >> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). >> >> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. >> >> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. >> >> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. >> >> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html >> >> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). >> >> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. >> >> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. >> >> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: >> >> [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns >> [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns >> [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns >> [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns >> [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns >> [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns >> [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns >> [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns >> [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns >> [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns >> [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns >> [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns >> [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns >> [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns >> [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns >> [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns >> [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns >> [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms >> >> >> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: >> >> [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms >> [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s >> [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets >> [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns >> [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns >> [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns >> [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns >> [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms >> >> >> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. >> >> I've tested run the patch through tier1-7. >> >> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review Coleen Marked as reviewed by coleenp (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6450 From coleenp at openjdk.java.net Thu Nov 18 17:34:43 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Nov 2021 17:34:43 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches [v2] In-Reply-To: References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: <8UhBoFffqzWeQJ96suEJbxltD8NOjsX7-MxUYzC20wU=.bbb9e801-600e-49f0-8e65-f5c82f251316@github.com> On Thu, 18 Nov 2021 14:32:50 GMT, Coleen Phillimore wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Review Coleen > > src/hotspot/share/code/compiledMethod.cpp line 482: > >> 480: } >> 481: } else { >> 482: return true; > > I've given up pretending to understand this code, but could you add a one line comment why you're returning true here? ie. if ic_metadata is NULL, it's a megamorphic call or already clean and shouldn't be cleaned. Thanks for the comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/6450 From simonis at openjdk.java.net Thu Nov 18 17:36:41 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 18 Nov 2021 17:36:41 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3] In-Reply-To: References: Message-ID: <9GqbVZKY1Z5fCvB-vuCwqIFwPXEDU1nHd002J3SS2KM=.a1ca56cc-016b-4daf-9f69-5bbf60f32e71@github.com> On Thu, 18 Nov 2021 15:25:15 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Volker Looks good to me know except for Zhengyu question. src/hotspot/share/services/mallocTracker.cpp line 134: > 132: > 133: // This function prints block information, including hex dump, in case of a detected > 134: // corruption. The hex dump should show the both block header and the corruption site ..show both, the block header.. test/hotspot/gtest/nmt/test_nmt_buffer_overflow_detection.cpp line 69: > 67: /////// > 68: > 69: // A overwriter farther away from the NMT header; the report should show the hex dump split up An overwrite ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Thu Nov 18 17:51:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Nov 2021 17:51:42 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 15:25:15 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Volker > I think you also need to subtract malloc_footer_size when calculating memblock_size below. Otherwise, memcpy can overwrite the footer. Oh man, good catch... this is too complicated. See, that's why I want to remove the GuardedMemory layer. Having that gone will be such a relief. So this is for a resize to a smaller size. We have this: [guard header] [nmt header] [ ... payload ... ] [nmt footer] [guard footer] and both nmt header and footer are now, from the POV of GuardedMemory, part of its payload. The os::malloc above already allocates a new block, and we need to copy the user payload while leaving the NMT footer intact. I'll first write a repro case - this should have been catched by tests - then I think about a solution. > I wonder should just consolidate malloc_header_size and malloc_footer_size to one malloc_overhead? I don't see them used separately. I take a look. I also found that we have two version of malloc_header_size() - one takes the NMT level, one takes a pointer. That makes me nervous resolution wise, though it's very probably fine. Maybe we can reduce the complexity a bit. Though I prefer to keep this patch as small as possihble. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From mdoerr at openjdk.java.net Thu Nov 18 18:35:43 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 18 Nov 2021 18:35:43 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le [v2] In-Reply-To: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com> References: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com> Message-ID: <6ikSOeIWtJPZbIzHuiiEbSmpT60lFaZgOWejMxyAg80=.378fb020-a2e1-42f9-8e38-0985408f87f1@github.com> On Thu, 18 Nov 2021 17:22:28 GMT, Niklas Radomski wrote: >> Port the Shenandoah garbage collector (JDK-8241457)[https://bugs.openjdk.java.net/browse/JDK-8241457] to linux on ppc64le. > > Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: > > Remove debug clobber code Thanks for the update! I think it's good to go. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6325 From nradomski at openjdk.java.net Thu Nov 18 18:58:39 2021 From: nradomski at openjdk.java.net (Niklas Radomski) Date: Thu, 18 Nov 2021 18:58:39 GMT Subject: RFR: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le [v2] In-Reply-To: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com> References: <-yZ-CH3Zdp9FbqRbcKA908KaXU7dWIuzy_SCyMluDr4=.31797995-8447-4902-89bd-693948925e49@github.com> Message-ID: On Thu, 18 Nov 2021 17:22:28 GMT, Niklas Radomski wrote: >> Port the Shenandoah garbage collector [JDK-8241457](https://bugs.openjdk.java.net/browse/JDK-8241457) to linux on ppc64le. > > Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: > > Remove debug clobber code Thank you for your reviews! Happy to see that the change has been so well received. ------------- PR: https://git.openjdk.java.net/jdk/pull/6325 From nradomski at openjdk.java.net Thu Nov 18 19:07:46 2021 From: nradomski at openjdk.java.net (Niklas Radomski) Date: Thu, 18 Nov 2021 19:07:46 GMT Subject: Integrated: 8276927: [PPC64] Port shenandoahgc to linux on ppc64le In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 09:00:04 GMT, Niklas Radomski wrote: > Port the Shenandoah garbage collector [JDK-8241457](https://bugs.openjdk.java.net/browse/JDK-8241457) to linux on ppc64le. This pull request has now been integrated. Changeset: 57eb8647 Author: Niklas Radomski Committer: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/57eb864765f38185f8db8f1d37681d6cfe2a3c73 Stats: 1521 lines in 8 files changed: 1519 ins; 0 del; 2 mod 8276927: [PPC64] Port shenandoahgc to linux on ppc64le Reviewed-by: rkennke, ihse, mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/6325 From smarks at openjdk.java.net Thu Nov 18 19:30:49 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 19:30:49 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 15:05:49 GMT, Peter Levart wrote: >> Or, you could move the static initialization block that statrts the finalizer thread into the Finalizer.FinalizerThread class itself and then arrange for that class to be initialized explicitly immediately after the Finalizer class, but conditionally, only if the option to disable finalization was not specified... >> This way the Finalizer class could still be initialized early, but the thread would not be started if it is not needed. > > If you then need this "flag" in the assert of registerFinalizer and runFinalization, you could use unsafe.shouldBeInitialized(Finalizer.FinalizerThread.class) as a means to find out whether the flag was set or not... The disable-finalization feature is a bit more than experimental. The goal is to provide a faithful representation of what the system will look like when finalization is removed. Of course most of that is objects' `finalize` methods not being called, but it also includes having no finalizer thread running, as well as having `runFinalization` (a public API) do nothing at all. Thus I think it's useful to have the flag visible to Java. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Thu Nov 18 19:30:51 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 19:30:51 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v2] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 07:52:18 GMT, David Holmes wrote: >> Yeah, "flag" is `Holder.ENABLED` here. I mean, are Java methods `registerFinalizer` and `runFinalization` called only by VM? If so, can VM check the whole thing on VM side, without going to Java and asking back from there? > > `registerFinalizer` does not expect to be called and only uses the "flag" as a form of assertion. > > `runFinalization` is called from Java code. @dholmes-ora If the Finalizer class is initialized explicitly and at the right time, then maybe we can do away with the Holder class entirely. Can you point me to where this is done? ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Thu Nov 18 20:05:15 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 20:05:15 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: Message-ID: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. > > Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6442/files - new: https://git.openjdk.java.net/jdk/pull/6442/files/911af0b1..5df8bf9f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=01-02 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442 PR: https://git.openjdk.java.net/jdk/pull/6442 From mchung at openjdk.java.net Thu Nov 18 20:16:37 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 18 Nov 2021 20:16:37 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> Message-ID: On Thu, 18 Nov 2021 06:49:03 GMT, Kim Barrett wrote: >> src/hotspot/share/prims/jvm.cpp line 694: >> >>> 692: >>> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env)) >>> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; >> >> missing indentation > > I think this could just be `return InstanceKlass::finalization_enabled();`. There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately. One typical way for VM to pass the arguments to the library is via private system properties. System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties). You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example. I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From mchung at openjdk.java.net Thu Nov 18 21:18:38 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 18 Nov 2021 21:18:38 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> References: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> Message-ID: <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com> On Thu, 18 Nov 2021 20:05:15 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups. When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Thu Nov 18 21:22:54 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 21:22:54 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> Message-ID: <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com> On Thu, 18 Nov 2021 20:13:23 GMT, Mandy Chung wrote: >> I think this could just be `return InstanceKlass::finalization_enabled();`. There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately. > > One typical way for VM to pass the arguments to the library is via private system properties. System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties). > > You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example. > > I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup. I renamed the function to `is_finalization_enabled` per previous comment, and I also made these cleanups. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From rkennke at openjdk.java.net Thu Nov 18 21:35:51 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 18 Nov 2021 21:35:51 GMT Subject: Integrated: 8275527: Refactor forward pointer access In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 16:37:02 GMT, Roman Kennke wrote: > Accessing the forward pointer is currently a little inconsistent. Some code paths call oopDesc::forwardee() / oopDesc::is_forwarded(), some code paths call forwardee() and check it for ==/!= NULL, some code paths even call markWord::decode_pointer() and markWord::is_marked() instead. > > This change attempts to make the situation more consistent. For simple cases it preserves oopDesc::forwardee() / is_forwarded(), some cases need to use the markWord for consistency in concurrent GC, they now use markWord::forwardee() and markWord::is_forwarded(). Also, checking whether or not an object is forwarded is now consistently done using is_forwarded() and not by checking forwardee ==/!= NULL. This also resolves the mess in G1 full GC that changes not-forwarded objects to have a NULL (fake-) pointer. This is not necessary, because we can just as well use the lock bits to determine whether or not the object is forwarded. > > Testing: > - [x] tier > - [x] tier2 > - [x] hotspot_gc This pull request has now been integrated. Changeset: 89b125f4 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/89b125f4d4d6a467185b4b39861fd530a738e67f Stats: 46 lines in 9 files changed: 4 ins; 26 del; 16 mod 8275527: Refactor forward pointer access Reviewed-by: tschatzl, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/5955 From smarks at openjdk.java.net Thu Nov 18 21:54:45 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 21:54:45 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 06:47:05 GMT, Aleksey Shipilev wrote: >> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups. > > src/hotspot/share/prims/jvm.cpp line 694: > >> 692: >> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env)) >> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; > > Suggestion: > > return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From coleenp at openjdk.java.net Thu Nov 18 22:04:58 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Nov 2021 22:04:58 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for Message-ID: Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. Tested with mach5 tier1-3. ------------- Commit messages: - 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for Changes: https://git.openjdk.java.net/jdk/pull/6466/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6466&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277342 Stats: 14 lines in 2 files changed: 0 ins; 12 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6466.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6466/head:pull/6466 PR: https://git.openjdk.java.net/jdk/pull/6466 From smarks at openjdk.java.net Thu Nov 18 22:07:41 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Thu, 18 Nov 2021 22:07:41 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com> References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com> Message-ID: On Thu, 18 Nov 2021 21:19:44 GMT, Stuart Marks wrote: >> One typical way for VM to pass the arguments to the library is via private system properties. System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties). >> >> You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example. >> >> I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup. > > I renamed the function to `is_finalization_enabled` per previous comment, and I also made these cleanups. Regarding using system properties, my initial prototype did this in the launcher, and it did run into the problem that the Finalizer class is initialized before system properties are available. That's why I created the Holder class, so that reading the property could be delayed until the first upcall to Finalizer::register. I suppose the initialization of Finalizer could be moved later, but that seems more invasive. The flag needs to be available in the VM in order to avoid upcalls for instances-with-finalizers in the first place. Alan had [suggested](https://bugs.openjdk.java.net/browse/JDK-8276422?focusedCommentId=14456185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14456185) moving the argument processing into the VM, and David suggested putting the flag into InstanceKlass, which seems a sensible place to me. It's also reasonably accessible there to GC implementations, should they want to inspect it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dcubed at openjdk.java.net Thu Nov 18 22:28:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 18 Nov 2021 22:28:41 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 21:56:58 GMT, Coleen Phillimore wrote: > Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. > Tested with mach5 tier1-3. @coleenp - The original failure happened in Tier5... ------------- PR: https://git.openjdk.java.net/jdk/pull/6466 From mchung at openjdk.java.net Thu Nov 18 22:42:39 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Thu, 18 Nov 2021 22:42:39 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> <3QzIh53czhYZl6kAtuP4lbxnBXY_eb5gmR1fN-WnBiY=.9d088dde-61c0-4151-8880-d7b7a66c317d@github.com> Message-ID: <1QdFtRR9FuHw4CehdL7NxWSEsdCHU6roiGiYQEJlEO0=.ed86261e-63ca-4684-93da-f153e252643e@github.com> On Thu, 18 Nov 2021 22:04:52 GMT, Stuart Marks wrote: >> I renamed the function to `is_finalization_enabled` per previous comment, and I also made these cleanups. > > Regarding using system properties, my initial prototype did this in the launcher, and it did run into the problem that the Finalizer class is initialized before system properties are available. That's why I created the Holder class, so that reading the property could be delayed until the first upcall to Finalizer::register. I suppose the initialization of Finalizer could be moved later, but that seems more invasive. > > The flag needs to be available in the VM in order to avoid upcalls for instances-with-finalizers in the first place. Alan had [suggested](https://bugs.openjdk.java.net/browse/JDK-8276422?focusedCommentId=14456185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14456185) moving the argument processing into the VM, and David suggested putting the flag into InstanceKlass, which seems a sensible place to me. It's also reasonably accessible there to GC implementations, should they want to inspect it. > Alan had suggested moving the argument processing into the VM, and David suggested putting the flag into InstanceKlass, which seems a sensible place to me. It's also reasonably accessible there to GC implementations, should they want to inspect it. That's still all good. What I meant is for the VM to add a private system property (not the launcher) as to pass the flag to the library code. The precedence is like `sun.nio.MaxDirectMemorySize` or `java.lang.Integer.IntegerCache.high`. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From david.holmes at oracle.com Thu Nov 18 23:23:37 2021 From: david.holmes at oracle.com (David Holmes) Date: Fri, 19 Nov 2021 09:23:37 +1000 Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <6s-4rTTyX8qZpavXbov9l2-H6BU7QLA0c71-K6xSQUM=.54dd6db2-5e78-4292-8958-72cadd762154@github.com> Message-ID: <3d0c8442-459c-54ef-6693-6e09cbfa5bbd@oracle.com> Hi Mandy, On 19/11/2021 6:16 am, Mandy Chung wrote: > On Thu, 18 Nov 2021 06:49:03 GMT, Kim Barrett wrote: > >>> src/hotspot/share/prims/jvm.cpp line 694: >>> >>>> 692: >>>> 693: JVM_ENTRY(jboolean, JVM_IsFinalizationEnabled(JNIEnv * env)) >>>> 694: return InstanceKlass::finalization_enabled() ? JNI_TRUE : JNI_FALSE; >>> >>> missing indentation >> >> I think this could just be `return InstanceKlass::finalization_enabled();`. There is lots of code in this file and elsewhere that assumes C++ `bool` converts to `jboolean` appropriately. > > One typical way for VM to pass the arguments to the library is via private system properties. System::initPhase1 will save the VM properties in `jdk.internal.misc.VM` and filters out the private properties from the system properties returned from System::getProperties (see System::createProperties). The Finalizer class is initialized before initPhase1() happens. So to use a property the Holder class had to be introduced to be initialized after initPhase1(). There is always a choice of having the VM push up a system property to the Java code, or the Java code calling down to query the VM. The VM call seems simpler/cheaper/cleaner in this case. Cheers, David > You can query the flag via `jdk.internal.misc.VM.getProperty("jdk.finalization.disabled")` for example. > > I don't see any issue moving the Finalizer class initialization after initPhase1 since there is no finalizer during VM startup. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/6442 > From coleenp at openjdk.java.net Thu Nov 18 23:27:47 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 18 Nov 2021 23:27:47 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 22:25:35 GMT, Daniel D. Daugherty wrote: >> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. >> Tested with mach5 tier1-3. > > @coleenp - The original failure happened in Tier5... @dcubed-ojdk thanks Dan. I'll rerun tier5 on our default platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/6466 From dholmes at openjdk.java.net Thu Nov 18 23:27:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Nov 2021 23:27:50 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> References: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> Message-ID: On Thu, 18 Nov 2021 20:05:15 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Rename InstanceKlass::finalization_enabled to is_finalization_enabled. Minor cleanups. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From bchristi at openjdk.java.net Thu Nov 18 23:39:43 2021 From: bchristi at openjdk.java.net (Brent Christian) Date: Thu, 18 Nov 2021 23:39:43 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com> References: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com> Message-ID: On Thu, 18 Nov 2021 21:15:11 GMT, Mandy Chung wrote: > When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM. Would it be interesting (perhaps in a follow-up) for GC.finalizer_info to report that the given VM had finalization disabled? ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Fri Nov 19 00:14:18 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Fri, 19 Nov 2021 00:14:18 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: > Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. > > Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: Remove Finalizer.Holder class. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6442/files - new: https://git.openjdk.java.net/jdk/pull/6442/files/5df8bf9f..e357eeec Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6442&range=02-03 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6442/head:pull/6442 PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Fri Nov 19 00:17:41 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Fri, 19 Nov 2021 00:17:41 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 04:13:21 GMT, Jaikiran Pai wrote: >> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove Finalizer.Holder class. > > src/java.base/share/classes/java/lang/ref/Finalizer.java line 195: > >> 193: >> 194: static { >> 195: if (Holder.ENABLED) { > > Hello Stuart, > My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary? I pushed an update to remove the Holder class. It seems to continue to work fine. Thanks for pointing this out @jaikiran ! ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Fri Nov 19 01:02:41 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Nov 2021 01:02:41 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Remove Finalizer.Holder class. Good simplification. src/java.base/share/classes/java/lang/ref/Finalizer.java line 64: > 62: } > 63: > 64: static final boolean ENABLED = isFinalizationEnabled(); private? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Fri Nov 19 01:06:38 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Nov 2021 01:06:38 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com> Message-ID: On Thu, 18 Nov 2021 23:36:23 GMT, Brent Christian wrote: > When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM. Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Fri Nov 19 01:34:45 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Nov 2021 01:34:45 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Remove Finalizer.Holder class. @stuart-marks : https://github.com/openjdk/jdk/pull/6469 (didn't intend to actually make a PR but clicked the wrong part of the button :) ) ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Fri Nov 19 02:05:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Nov 2021 02:05:42 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 21:56:58 GMT, Coleen Phillimore wrote: > Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. > Tested with mach5 tier1-3. Hi Coleen, The changes in themselves seem fine. My only concern is whether always locking will introduce contention and impact performance. The code was attempting the classic pattern of doing a lock-free query first, but as you note it lacks the necessary memory ordering operations. So if needed we could make the lock-free path work correctly. Thanks, David src/hotspot/share/oops/instanceKlass.cpp line 2064: > 2062: } > 2063: > 2064: /* jni_id_forfor jfieldIds only */ space needed between for's :) src/hotspot/share/oops/instanceKlass.cpp line 2067: > 2065: JNIid* InstanceKlass::jni_id_for(int offset) { > 2066: MutexLocker ml(JfieldIdCreation_lock); > 2067: // Retry lookup after we got the lock The comment doesn't make sense now as there is only one lookup. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6466 From smarks at openjdk.java.net Fri Nov 19 02:32:41 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Fri, 19 Nov 2021 02:32:41 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com> Message-ID: On Fri, 19 Nov 2021 01:03:22 GMT, David Holmes wrote: > > When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM. > > Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up. Seems simple enough. Is there any testing that needs to be done for this? Does jcmd output require CSR review? I guess there would be a compatibility issue if there were something that was parsing the output of jcmd. Or is it solely intended to be read by humans? ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Fri Nov 19 02:35:44 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Fri, 19 Nov 2021 02:35:44 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 00:59:10 GMT, David Holmes wrote: >> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove Finalizer.Holder class. > > src/java.base/share/classes/java/lang/ref/Finalizer.java line 64: > >> 62: } >> 63: >> 64: static final boolean ENABLED = isFinalizationEnabled(); > > private? Yeah, probably should be private. Other stuff in this class is private except things that are used from outside. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From coleenp at openjdk.java.net Fri Nov 19 02:39:15 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Nov 2021 02:39:15 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for [v2] In-Reply-To: References: Message-ID: > Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. > Tested with mach5 tier1-3. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6466/files - new: https://git.openjdk.java.net/jdk/pull/6466/files/47cb164b..c44b86d5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6466&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6466&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6466.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6466/head:pull/6466 PR: https://git.openjdk.java.net/jdk/pull/6466 From coleenp at openjdk.java.net Fri Nov 19 02:42:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Nov 2021 02:42:41 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore wrote: >> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. >> Tested with mach5 tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. Thanks for reviewing. I fixed the comments. I will see if I can find some jni performance tests tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/6466 From jpai at openjdk.java.net Fri Nov 19 04:24:37 2021 From: jpai at openjdk.java.net (Jaikiran Pai) Date: Fri, 19 Nov 2021 04:24:37 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: <75LlCxFbmJ2QwkczlGFpPQN7Gl3gAsfylqufYtIOkcI=.427249fe-b817-4eae-9395-9c7095d05839@github.com> On Fri, 19 Nov 2021 00:14:34 GMT, Stuart Marks wrote: >> src/java.base/share/classes/java/lang/ref/Finalizer.java line 195: >> >>> 193: >>> 194: static { >>> 195: if (Holder.ENABLED) { >> >> Hello Stuart, >> My understanding of the the lazy `Holder` is that it's there to delay the static initialization of the code that's part of the `Holder`. In this case here, the `Holder` is being used right within the `static` block of the `Finalizer` class, that too as the first thing. In this case, is that `Holder` class necessary? > > I pushed an update to remove the Holder class. It seems to continue to work fine. Thanks for pointing this out @jaikiran ! Thank you Stuart, this changed version looks fine to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Fri Nov 19 04:59:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Nov 2021 04:59:39 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v3] In-Reply-To: References: <4fYIR8SXkUTipn7wyUCWmBYgYwsL1MRd1bKmxyV6YLk=.9a9aef46-d959-402a-a6d8-c0fb65f06983@github.com> <0NKLwjDgge6gVJVHZr8o87VQvjz2FhNte1UPgiqs9qA=.c4eda9c4-d8a7-4ccc-8c64-9967e3c2923c@github.com> Message-ID: On Fri, 19 Nov 2021 02:29:33 GMT, Stuart Marks wrote: >>> When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM. >> >> Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up. > >> > When the finalization is disabled, perhaps jcmd GC.finalizer_info should just be made as a nop in the VM. >> >> Yes that is a trivial change to add. @stuart-marks I can provide the code. You can choose whether to include in this PR or else we can do a follow-up. > > Seems simple enough. Is there any testing that needs to be done for this? Does jcmd output require CSR review? I guess there would be a compatibility issue if there were something that was parsing the output of jcmd. Or is it solely intended to be read by humans? @stuart-marks No CSR needed for this as no output format is specified. Plus this command already has a simple text response when there are no finalizers queued. E.g. ``` > ../build/linux-x64-debug-finalization/images/jdk/bin/jcmd 27939 GC.finalizer_info 27939: No instances waiting for finalization found so when finalization is disabled this just becomes: ``` > ../build/linux-x64-debug-finalization/images/jdk/bin/jcmd 28018 GC.finalizer_info 28018: Finalization is disabled There is a test for this Dcmd, but it doesn't test the "nothing here" case so I don't think it is necessary to augment it for this case: `hotspot/jtreg/serviceability/dcmd/gc/FinalizerInfoTest.java` ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From kbarrett at openjdk.java.net Fri Nov 19 05:48:51 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 19 Nov 2021 05:48:51 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: <_pYTRBr5EsVrnnMnDOgp1KiJGvDpyqb5cPVgWRu88mA=.abedd733-19e3-4f57-9714-cfe7da8f96d1@github.com> On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Remove Finalizer.Holder class. Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From thartmann at openjdk.java.net Fri Nov 19 07:07:38 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 07:07:38 GMT Subject: RFR: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: <510uXpyEyiEC6_m1bWk_USuJ1kpv5QSgImi1NheRBkw=.52c4fbdf-713c-4868-ade3-2d52c155d915@github.com> On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann wrote: > Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 > > The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 > > Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? > > Thanks, > Tobias Thanks for checking! ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From thartmann at openjdk.java.net Fri Nov 19 07:10:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 07:10:49 GMT Subject: Integrated: 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 11:41:04 GMT, Tobias Hartmann wrote: > Code in `LongVector::lanewiseTemplate` currently implements the `NEG` operation as a `SUB` and has a corresponding `FIXME` comment: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L534-L541 > > The implicit assumption is that since we will never pass `NEG` to `VectorSupport.unaryOp` in line 540, the corresponding C2 intrinsic does not need to handle that case. That's not guaranteed though because C2 might still compile that path when not being able to prove that it's unreachable at parse time. As a result, we then assert in the intrinsic because the negation operation on a long vector is currently not supported (i.e. there is no `Op_NegVL`). I propose to simply handle this case in ` VectorSupport::vop2ideal`. We will then bail out from intrinsification with `operation not supported: opc=NegL bt=long` because `VectorNode::opcode` returns 0: > https://github.com/openjdk/jdk/blob/e9934e1243929514e147ecdd3cefa74168ed0500/src/hotspot/share/opto/vectorIntrinsics.cpp#L390-L394 > > Question to the Vector API experts: There are other `FIXME: Support this in the JIT` comments in the code. Do these code paths suffer from similar issues? Is there a tracking RFE/bug? > > Thanks, > Tobias This pull request has now been integrated. Changeset: 47564cae Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/47564caeb0628e5c03a0e7f04093adce77d6dd3b Stats: 51 lines in 2 files changed: 51 ins; 0 del; 0 mod 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg Reviewed-by: chagedorn, sviswanathan ------------- PR: https://git.openjdk.java.net/jdk/pull/6428 From thartmann at openjdk.java.net Fri Nov 19 07:12:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 19 Nov 2021 07:12:54 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches [v2] In-Reply-To: References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson wrote: >> We got a report on the zgc-dev list about a large performance issue affecting ZGC: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html >> >> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: >> >> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms >> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms >> >> and while this were happening we got a huge number of ICBufferFull safepoints. >> >> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: >> >> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> oop ic_oop = ic->cached_oop(); >> if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> if (ic_oop->is_compiledICHolder()) { >> compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); >> if (is_alive->do_object_b( >> cichk_oop->holder_method()->method_holder()) && >> is_alive->do_object_b(cichk_oop->holder_klass())) { >> continue; >> } >> } >> ic->set_to_clean(); >> assert(ic->cached_oop() == NULL, >> "cached oop in IC should be cleared"); >> } >> } >> >> >> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> if (ic->is_icholder_call()) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> CompiledICHolder* cichk_oop = ic->cached_icholder(); >> if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && >> cichk_oop->holder_klass()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> Metadata* ic_oop = ic->cached_metadata(); >> if (ic_oop != NULL) { >> if (ic_oop->is_klass()) { >> if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { >> continue; >> } >> } else if (ic_oop->is_method()) { >> if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> ShouldNotReachHere(); >> } >> } >> } >> ic->set_to_clean(); >> } >> >> >> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. >> >> To understand why this is causing the problems we are seeing it's good to start by reading: >> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall >> >> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). >> >> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. >> >> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. >> >> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. >> >> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html >> >> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). >> >> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. >> >> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. >> >> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: >> >> [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns >> [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns >> [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns >> [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns >> [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns >> [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns >> [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns >> [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns >> [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns >> [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns >> [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns >> [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns >> [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns >> [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns >> [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns >> [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns >> [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns >> [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms >> >> >> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: >> >> [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms >> [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s >> [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets >> [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns >> [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns >> [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns >> [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns >> [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms >> >> >> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. >> >> I've tested run the patch through tier1-7. >> >> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review Coleen Good catch and great summary! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6450 From stefank at openjdk.java.net Fri Nov 19 08:04:43 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 19 Nov 2021 08:04:43 GMT Subject: RFR: 8277212: GC accidentally cleans valid megamorphic vtable inline caches [v2] In-Reply-To: References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: <0tGQlAt66ViSYdooeKTmTfDeOQXmKwxN_0U7CVZ5BTw=.b6ce9010-04ba-463f-8c34-c5ddb2ac1e53@github.com> On Thu, 18 Nov 2021 15:26:10 GMT, Stefan Karlsson wrote: >> We got a report on the zgc-dev list about a large performance issue affecting ZGC: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html >> >> One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: >> >> [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms >> [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms >> >> and while this were happening we got a huge number of ICBufferFull safepoints. >> >> It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: >> >> https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> oop ic_oop = ic->cached_oop(); >> if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> if (ic_oop->is_compiledICHolder()) { >> compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); >> if (is_alive->do_object_b( >> cichk_oop->holder_method()->method_holder()) && >> is_alive->do_object_b(cichk_oop->holder_klass())) { >> continue; >> } >> } >> ic->set_to_clean(); >> assert(ic->cached_oop() == NULL, >> "cached oop in IC should be cleared"); >> } >> } >> >> >> The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: >> >> CompiledIC *ic = CompiledIC_at(iter.reloc()); >> if (ic->is_icholder_call()) { >> // The only exception is compiledICHolder oops which may >> // yet be marked below. (We check this further below). >> CompiledICHolder* cichk_oop = ic->cached_icholder(); >> if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && >> cichk_oop->holder_klass()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> Metadata* ic_oop = ic->cached_metadata(); >> if (ic_oop != NULL) { >> if (ic_oop->is_klass()) { >> if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { >> continue; >> } >> } else if (ic_oop->is_method()) { >> if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { >> continue; >> } >> } else { >> ShouldNotReachHere(); >> } >> } >> } >> ic->set_to_clean(); >> } >> >> >> Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. >> >> To understand why this is causing the problems we are seeing it's good to start by reading: >> https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall >> >> When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). >> >> But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. >> >> But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC to "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. >> >> G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. >> >> I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: >> https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html >> >> I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). >> >> I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. >> >> To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. >> >> Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: >> >> [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns >> [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns >> [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns >> [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns >> [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns >> [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns >> [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns >> [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns >> [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns >> [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns >> [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns >> [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns >> [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns >> [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns >> [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns >> [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns >> [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns >> [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns >> [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns >> [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns >> [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns >> [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns >> [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns >> [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns >> [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns >> [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns >> [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns >> [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns >> [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns >> [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns >> [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns >> [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns >> [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms >> >> >> Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: >> >> [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms >> [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s >> [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms >> [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets >> [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns >> [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns >> [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns >> [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns >> [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms >> >> >> I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. >> >> I've tested run the patch through tier1-7. >> >> Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review Coleen Thanks all for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/6450 From jbhateja at openjdk.java.net Fri Nov 19 08:10:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 19 Nov 2021 08:10:42 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com> On Thu, 18 Nov 2021 06:55:34 GMT, Pengfei Li wrote: >> Arraycopy partial inlining is a C2 compiler technique that avoids stub >> call overhead in small-sized arraycopy operations by generating masked >> vector instructions. So far it works on x86 AVX512 only and this patch >> enables it on AArch64 with SVE. >> >> We add AArch64 matching rule for VectorMaskGenNode and refactor that >> node a little bit. The major change is moving the element type field >> into its TypeVectMask bottom type. The reason is that AArch64 vector >> masks are different for different vector element types. >> >> E.g., an x86 AVX512 vector mask value masking 3 least significant vector >> lanes (of any type) is like >> >> `0000 0000 ... 0000 0000 0000 0000 0111` >> >> On AArch64 SVE, this mask value can only be used for masking the 3 least >> significant lanes of bytes. But for 3 lanes of ints, the value should be >> >> `0000 0000 ... 0000 0000 0001 0001 0001` >> >> where the least significant bit of each lane matters. So AArch64 matcher >> needs to know the vector element type to generate right masks. >> >> After this patch, the C2 generated code for copying a 50-byte array on >> AArch64 SVE looks like >> >> mov x12, #0x32 >> whilelo p0.b, xzr, x12 >> add x11, x11, #0x10 >> ld1b {z16.b}, p0/z, [x11] >> add x10, x10, #0x10 >> st1b {z16.b}, p0, [x10] >> >> We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on >> both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested >> JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array >> size arguments on a 512-bit SVE-featured CPU. We got below performance >> data changes. >> >> Benchmark (length) (Performance) >> ArrayCopyAligned.testByte 10 -2.6% >> ArrayCopyAligned.testByte 20 +4.7% >> ArrayCopyAligned.testByte 30 +4.8% >> ArrayCopyAligned.testByte 40 +21.7% >> ArrayCopyAligned.testByte 50 +22.5% >> ArrayCopyAligned.testByte 60 +28.4% >> >> The test machine has SVE vector size of 512 bits, so we see performance >> gain for most array sizes less than 64 bytes. For very small arrays we >> see a bit regression because a vector load/store may be a bit slower >> than 1 or 2 scalar loads/stores. > > The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR. Hi @pfustc , common type system changes looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From jbhateja at openjdk.java.net Fri Nov 19 08:24:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 19 Nov 2021 08:24:42 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 03:50:45 GMT, Pengfei Li wrote: > Arraycopy partial inlining is a C2 compiler technique that avoids stub > call overhead in small-sized arraycopy operations by generating masked > vector instructions. So far it works on x86 AVX512 only and this patch > enables it on AArch64 with SVE. > > We add AArch64 matching rule for VectorMaskGenNode and refactor that > node a little bit. The major change is moving the element type field > into its TypeVectMask bottom type. The reason is that AArch64 vector > masks are different for different vector element types. > > E.g., an x86 AVX512 vector mask value masking 3 least significant vector > lanes (of any type) is like > > `0000 0000 ... 0000 0000 0000 0000 0111` > > On AArch64 SVE, this mask value can only be used for masking the 3 least > significant lanes of bytes. But for 3 lanes of ints, the value should be > > `0000 0000 ... 0000 0000 0001 0001 0001` > > where the least significant bit of each lane matters. So AArch64 matcher > needs to know the vector element type to generate right masks. > > After this patch, the C2 generated code for copying a 50-byte array on > AArch64 SVE looks like > > mov x12, #0x32 > whilelo p0.b, xzr, x12 > add x11, x11, #0x10 > ld1b {z16.b}, p0/z, [x11] > add x10, x10, #0x10 > st1b {z16.b}, p0, [x10] > > We ran jtreg hotspot::hotspot_all, jdk::tier1~3 and langtools::tier1 on > both x86 AVX512 and AArch64 SVE machines, no issue is found. We tested > JMH org/openjdk/bench/java/lang/ArrayCopyAligned.java with small array > size arguments on a 512-bit SVE-featured CPU. We got below performance > data changes. > > Benchmark (length) (Performance) > ArrayCopyAligned.testByte 10 -2.6% > ArrayCopyAligned.testByte 20 +4.7% > ArrayCopyAligned.testByte 30 +4.8% > ArrayCopyAligned.testByte 40 +21.7% > ArrayCopyAligned.testByte 50 +22.5% > ArrayCopyAligned.testByte 60 +28.4% > > The test machine has SVE vector size of 512 bits, so we see performance > gain for most array sizes less than 64 bytes. For very small arrays we > see a bit regression because a vector load/store may be a bit slower > than 1 or 2 scalar loads/stores. Common type system changes looks good to me. ------------- Marked as reviewed by jbhateja (Committer). PR: https://git.openjdk.java.net/jdk/pull/6444 From stuefe at openjdk.java.net Fri Nov 19 09:33:19 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 09:33:19 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v4] In-Reply-To: References: Message-ID: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. > > This proposal adds NMT buffer overflow checking: > > - it gives us C-heap overflow checking in release builds > - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. > - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. > > For more details, please see the JBS issue. > > ---- > > Patch notes: > > - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. > - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. > - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. > > - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. > > - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. > > - I made the assert for malloc site table width a compile time STATIC_ASSERT. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 4 weeks now without problems Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Extend gtests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5952/files - new: https://git.openjdk.java.net/jdk/pull/5952/files/a1611e78..d3677c1f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=02-03 Stats: 61 lines in 4 files changed: 47 ins; 0 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5952.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952 PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Fri Nov 19 09:33:20 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 09:33:20 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks In-Reply-To: <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com> References: <_r5qw_r-3Be7zUJuf4gcb10MFe9varAWAvix_CaJiYs=.758ad563-f2d2-466d-bcb5-1ccc6b547e94@github.com> <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com> Message-ID: On Sun, 17 Oct 2021 13:30:17 GMT, Zhengyu Gu wrote: >>> > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one? >>> > > >>> > > >>> > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301: >>> > > Disadvantages of the current solution: >>> > > >>> > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs. >>> > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds. >>> > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes. >>> > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller. >>> > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build). >>> > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful. >>> > > >>> > > Thanks, Thomas >>> > >>> > >>> > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole. >>> >>> Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right? >> >> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw). >> >> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. >> >> Cheers, Thomas > >> > > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one? >> > > > >> > > > >> > > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301: >> > > > Disadvantages of the current solution: >> > > > >> > > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs. >> > > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds. >> > > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes. >> > > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller. >> > > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build). >> > > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful. >> > > > >> > > > Thanks, Thomas >> > > >> > > >> > > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole. >> > >> > >> > Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right? >> >> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw). >> >> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. >> >> Cheers, Thomas > > I have no problem on technical side. Changing NMT default value, I believe, needs CSR. Probably should start with a CSR to get a consensus. > > Thanks. > > -Zhengyu Added a thorough regression test for the `realloc` issue but I was unable to provoce the theoretical error @zhengyu123 pointed out in practice. When analyzing I found that out of accident the current coding already works: - a realloc to a smaller size will memcpy() the original payload with the new size, since we use MIN2(size, memblock_size) and use the new, smaller, payload size. That will leave the NMT footer intact which had been added by the os::malloc above. - a realloc to a larger size will memcpy() with memblock_size, and Zhengyu is right, that is too large. The effect of that is that we copy the original footer too. But that is fine. Since the footer is only one byte, we will, again, leave the new NMT footer added by os::malloc() intact. Still, Zhengyu was right, this is a problem. I will experiment with a larger footer since I believe that should fail as predicted (I just want to see my new regression tests actually fire :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From sspitsyn at openjdk.java.net Fri Nov 19 10:11:39 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Fri, 19 Nov 2021 10:11:39 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 17:15:06 GMT, Daniel D. Daugherty wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1533: > >> 1531: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ >> 1532: } >> 1533: assert(java_thread == _state->get_thread(), "Must be"); > > This `assert()` is the site of the original test failure. I haven't yet > looked at the locations of the other changes. > > The `is_exiting()` check is made under the protection of the > `JvmtiThreadState_lock` so an unsuspended target thread that is > exiting cannot reach the point where the `_state` is updated to > clear the `JavaThread*` so we can't fail the `assert()` if the > `is_exiting()` check has returned `false`. Dan, Thank you for reviewing this! I'm not sure, I correctly understand you here. Are you saying that you agree with this change? In fact, the thread state can not be changed (and the assert fired) after the `is_exiting()` check is made even without `JvmtiThreadState_lock` protection because it is inside of a handshake execution. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From sspitsyn at openjdk.java.net Fri Nov 19 10:17:39 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Fri, 19 Nov 2021 10:17:39 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com> On Thu, 18 Nov 2021 17:18:23 GMT, Daniel D. Daugherty wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1401: > >> 1399: if (!self) { >> 1400: if (!java_thread->is_suspended()) { >> 1401: _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED; > > I don't see an obvious reason for this `is_exiting()` check. Okay. I see similar check in the `force_early_return()` function: if (state == NULL) { return JVMTI_ERROR_THREAD_NOT_ALIVE; } Would it better to replace it with this check instead? : if (java_thread->is_exiting()) { return JVMTI_ERROR_THREAD_NOT_ALIVE; } > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1625: > >> 1623: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ >> 1624: } >> 1625: assert(_state->get_thread() == java_thread, "Must be"); > > The `assert()` on L1625 is subject to the same race as the original site. > This `is_exiting()` check is made under the protection of the > `JvmtiThreadState_lock` so it is sufficient to protect that `assert()`. Okay, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From stuefe at openjdk.java.net Fri Nov 19 14:25:46 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 14:25:46 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v5] In-Reply-To: References: Message-ID: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. > > This proposal adds NMT buffer overflow checking: > > - it gives us C-heap overflow checking in release builds > - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. > - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. > > For more details, please see the JBS issue. > > ---- > > Patch notes: > > - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. > - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. > - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. > > - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. > > - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. > > - I made the assert for malloc site table width a compile time STATIC_ASSERT. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 4 weeks now without problems Thomas Stuefe has updated the pull request incrementally with 115 additional commits since the last revision: - Fix Zhengyu Problem in os::realloc - update comment after increasing footer - improve test - 8277439: G1: Correct include guard name in G1EvacFailureObjectsSet.hpp Reviewed-by: tschatzl, sjohanss - 8277371: Remove unnecessary DefNewGeneration::ref_processor_init() Reviewed-by: stefank, tschatzl, mli - 8277324: C2 compilation fails with "bad AD file" on x86-32 after JDK-8276162 due to missing match rule Reviewed-by: chagedorn, roland - 8273039: JShell crashes when naming variable or method "abstract" or "strictfp" Reviewed-by: vromero - 8277213: CompileTask_lock is acquired out of order with MethodCompileQueue_lock Reviewed-by: rbackman, coleenp - 8275643: C2's unaryOp vector intrinsic does not properly handle LongVector.neg Reviewed-by: chagedorn, sviswanathan - 8277102: Dubious PrintCompilation output Reviewed-by: thartmann, dnsimon - ... and 105 more: https://git.openjdk.java.net/jdk/compare/d3677c1f...17a5bc71 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5952/files - new: https://git.openjdk.java.net/jdk/pull/5952/files/d3677c1f..17a5bc71 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=03-04 Stats: 39754 lines in 658 files changed: 28691 ins; 5278 del; 5785 mod Patch: https://git.openjdk.java.net/jdk/pull/5952.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952 PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Fri Nov 19 14:29:17 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 14:29:17 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6] In-Reply-To: References: Message-ID: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. > > This proposal adds NMT buffer overflow checking: > > - it gives us C-heap overflow checking in release builds > - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. > - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. > > For more details, please see the JBS issue. > > ---- > > Patch notes: > > - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. > - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. > - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. > > - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. > > - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. > > - I made the assert for malloc site table width a compile time STATIC_ASSERT. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 4 weeks now without problems Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Volker Feedback 2 - Fix Zhengyu Problem in os::realloc - Extend gtests - extend footer to 2 bytes - Feedback Volker - Let NMT do overflow detection ------------- Changes: https://git.openjdk.java.net/jdk/pull/5952/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5952&range=05 Stats: 434 lines in 11 files changed: 385 ins; 11 del; 38 mod Patch: https://git.openjdk.java.net/jdk/pull/5952.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5952/head:pull/5952 PR: https://git.openjdk.java.net/jdk/pull/5952 From rkennke at openjdk.java.net Fri Nov 19 14:43:59 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 19 Nov 2021 14:43:59 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass Message-ID: In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. Testing: - [x] tier1 (x86_64) - [ ] tier2 (x86_64) - [ ] tier3 (x86_64) ------------- Commit messages: - Add debug info for null checks - 8277417: C1 LIR instruction for load-klass Changes: https://git.openjdk.java.net/jdk/pull/6464/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6464&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277417 Stats: 183 lines in 11 files changed: 134 ins; 37 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/6464.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6464/head:pull/6464 PR: https://git.openjdk.java.net/jdk/pull/6464 From iveresov at openjdk.java.net Fri Nov 19 14:44:00 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 19 Nov 2021 14:44:00 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 20:16:27 GMT, Roman Kennke wrote: > In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. > > Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. > > The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. > > Testing: > - [x] tier1 (x86_64) > - [ ] tier2 (x86_64) > - [ ] tier3 (x86_64) Nice! Thank you! ------------- Marked as reviewed by iveresov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6464 From stuefe at openjdk.java.net Fri Nov 19 14:45:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 14:45:42 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v3] In-Reply-To: <9GqbVZKY1Z5fCvB-vuCwqIFwPXEDU1nHd002J3SS2KM=.a1ca56cc-016b-4daf-9f69-5bbf60f32e71@github.com> References: <9GqbVZKY1Z5fCvB-vuCwqIFwPXEDU1nHd002J3SS2KM=.a1ca56cc-016b-4daf-9f69-5bbf60f32e71@github.com> Message-ID: On Thu, 18 Nov 2021 17:33:50 GMT, Volker Simonis wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback Volker > > Looks good to me know except for Zhengyu question. Hi @simonis, @zhengyu123, I somehow messed up and force-pushed. This is what happened since the last review: - https://github.com/openjdk/jdk/pull/5952/commits/2247b5e6f6d6aff2c54fe51f1e064876bba43963 : increases the footer canary to two bytes. - https://github.com/openjdk/jdk/pull/5952/commits/188f0ea36a12d20be960ec98eb303669c6fcd714 : extended the gtests, mainly to test realloc more thoroughly. - Added one death test to show that realloc also does heap corruption checks on the old block. - Another test - not a death test but a regular test - just to test that realloc works. This was in reaction to the bug Zhengyu found, but I never got it to fire. After analysing I believe the bug was benign. Still, good to have this test. - https://github.com/openjdk/jdk/pull/5952/commits/ea6fe31c08af1a7073e3d14a37a572beb43a027c : This one actually fixes the bug Zhengyu found. I kept the fix simple stupid and refrained from cleaning up too much. I just removed two methods which were not needed anymore. - https://github.com/openjdk/jdk/pull/5952/commits/3d2a5d00b7dac5411f3c1956a4b5c8b6e1a76a66 : Last one fixes the last typos Volker found. Thanks again, guys, for your reviews. I plan to give this another round in our test systems before pushing. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From stefank at openjdk.java.net Fri Nov 19 15:37:46 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 19 Nov 2021 15:37:46 GMT Subject: Integrated: 8277212: GC accidentally cleans valid megamorphic vtable inline caches In-Reply-To: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> References: <9wD0oJ2P5bG1qYJ71qdCN4-Q_fiOkKWh4kXTdw8Yb8o=.fc8b7be9-7818-4414-9177-542e2b061480@github.com> Message-ID: On Thu, 18 Nov 2021 09:56:37 GMT, Stefan Karlsson wrote: > We got a report on the zgc-dev list about a large performance issue affecting ZGC: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001086.html > > One of the issues that the reporter identified was that we could get extremely long class unloading / unlinking times: > > [17606.140s][info][gc,phases ] GC(719) Concurrent Process Non-Strong References 25781.928ms > [17610.181s][info][gc,stats ] Subphase: Concurrent Classes Unlink 14280.772 / 25769.511 1126.563 / 25769.511 217.882 / 68385.750 217.882 / 68385.750 ms > > and while this were happening we got a huge number of ICBufferFull safepoints. > > It turns out that we have a 10-year-old bug in the inline cache cleaning code. This code came in with the permgen removal. See how the original code only calls set_to_clean when ic_oop is non-null: > > https://github.com/openjdk/jdk/commit/5c58d27aac7b291b879a7a3ff6f39fca25619103 > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > oop ic_oop = ic->cached_oop(); > if (ic_oop != NULL && !is_alive->do_object_b(ic_oop)) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > if (ic_oop->is_compiledICHolder()) { > compiledICHolderOop cichk_oop = compiledICHolderOop(ic_oop); > if (is_alive->do_object_b( > cichk_oop->holder_method()->method_holder()) && > is_alive->do_object_b(cichk_oop->holder_klass())) { > continue; > } > } > ic->set_to_clean(); > assert(ic->cached_oop() == NULL, > "cached oop in IC should be cleared"); > } > } > > > The rewritten code put the set_to_clean call in a different scope, causing the CompiledIC to also be cleaned when ic_oop is NULL: > > CompiledIC *ic = CompiledIC_at(iter.reloc()); > if (ic->is_icholder_call()) { > // The only exception is compiledICHolder oops which may > // yet be marked below. (We check this further below). > CompiledICHolder* cichk_oop = ic->cached_icholder(); > if (cichk_oop->holder_method()->method_holder()->is_loader_alive(is_alive) && > cichk_oop->holder_klass()->is_loader_alive(is_alive)) { > continue; > } > } else { > Metadata* ic_oop = ic->cached_metadata(); > if (ic_oop != NULL) { > if (ic_oop->is_klass()) { > if (((Klass*)ic_oop)->is_loader_alive(is_alive)) { > continue; > } > } else if (ic_oop->is_method()) { > if (((Method*)ic_oop)->method_holder()->is_loader_alive(is_alive)) { > continue; > } > } else { > ShouldNotReachHere(); > } > } > } > ic->set_to_clean(); > } > > > Note the weird indentation, which could be seen as a hint that this might be a dubious / accidental change. > > To understand why this is causing the problems we are seeing it's good to start by reading: > https://wiki.openjdk.java.net/display/HotSpot/Overview+of+CompiledIC+and+CompiledStaticCall > > When the GC hits this path and finds an ic_oop that is NULL, it means that it is dealing with an inline cache that is a megamorphic vtable call (or clean). Those should not be cleaned (at least that wasn't the intention of the old code). > > But now we do clean them, and to do so we use an ICStub (to make a safe transition to the clean state), which uses up slots in the ICBuffer. When the ICBuffer is full, concurrent GCs have to stop and schedule an ICBufferFull safepoint stop-the-world operation, which removes the ICStub from the inline cache and completely frees up the ICBuffer. If the GC cleans a lot of these megamorphic vtable inline caches, then we'll create a large number of ICBufferFull safepoints. > > But it is even worse than that. After the class unloading GCs have destroyed all megamorphic vtable inline caches, the Java threads will see these cleaned inline caches and correct them. Correcting the cleaned inline caches from the Java threads will also use ICStubs, and eventually the inline caches will transition back to be a megamorphic vtable calls. Because of this we can end up in a situation where the GC and Java threads change the inline cache back and forth between clean and megamorphic vtable calls. When this happen both GC and Java threads will continuously schedule ICBufferFull safepoints, and this can go on for many seconds, even minutes, if we are unlucky. For ZGC this has the effect that it blocks any further GC work, and eventually the Java threads will run out of memory and hit allocation stalls. The Java threads will then wait for the GC to "clean" all inline caches and exit the class unloading phase and proceed to the phase where memory is reclaimed. You can see in the GC logs that even though the problematic unlinking phase goes on for many seconds, the allocation stalls are "only" a few hundred milliseconds. This shows that when the Java threads stop fighting over the inline caches, the GC can finish the work relatively quickly. > > G1 performs the inline cache cleaning while the Java threads are stopped, and therefore don't have to use ICStubs when the megamorphic vtables are accidentally cleaned. So, G1 (and other stop-the-world class unloading GCs) won't enter the situation where the GC and Java thread concurrently fight over the inline caches. It still causes the Java threads to have to take a slow path and fix the inline caches, which can result in unnecessary ICBufferFull safepoints. > > I been able to reproduce the issue where ZGC and the Java threads fight over the ICStubs, causing minute long unloading times, by running one of the microbenchmarks from the Blackbird library used by the reporter of this issue. See description in: > https://mail.openjdk.java.net/pipermail/zgc-dev/2021-November/001096.html > > I think this could be reproduced in other workloads as well. I've also been able to reproduce the excessive ICBufferFull safepoints with Kitchensink (an oracle-internal stress test). > > I've verified that restoring the set_to_clean code to the right scope fixes the issue that I can reproduce with both Blackbird and Kitchensink. After the fix, the class unloading times go back to normal levels. > > To identify this issue, it's good to run with -Xlog:gc*,safepoint and take note of the "Concurrent Process Non-Strong References" times and ICBufferFull safepoint lines. > > Example logs from ZGC where concurrent cleaning causes ICBufferFull safepoints: > > [38.557s][1637062062666ms][info ][gc,phases ] GC(222) Concurrent Mark Free 0.001ms > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 7389821 ns, Reaching safepoint: 167546 ns, At safepoint: 6840 ns, Total: 174386 ns > [38.565s][1637062062673ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 27749 ns, Reaching safepoint: 89368 ns, At safepoint: 5710 ns, Total: 95078 ns > [38.566s][1637062062674ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 678872 ns, Reaching safepoint: 145967 ns, At safepoint: 6969 ns, Total: 152936 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 934596 ns, Reaching safepoint: 165826 ns, At safepoint: 5460 ns, Total: 171286 ns > [38.567s][1637062062675ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16500 ns, Reaching safepoint: 91147 ns, At safepoint: 5770 ns, Total: 96917 ns > [38.568s][1637062062677ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1124041 ns, Reaching safepoint: 154426 ns, At safepoint: 6280 ns, Total: 160706 ns > [38.570s][1637062062678ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1222819 ns, Reaching safepoint: 152646 ns, At safepoint: 6920 ns, Total: 159566 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1070303 ns, Reaching safepoint: 152686 ns, At safepoint: 6029 ns, Total: 158715 ns > [38.571s][1637062062679ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 23650 ns, Reaching safepoint: 83208 ns, At safepoint: 6170 ns, Total: 89378 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1005014 ns, Reaching safepoint: 148206 ns, At safepoint: 5660 ns, Total: 153866 ns > [38.572s][1637062062681ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 15110 ns, Reaching safepoint: 84047 ns, At safepoint: 5690 ns, Total: 89737 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1370755 ns, Reaching safepoint: 171876 ns, At safepoint: 5030 ns, Total: 176906 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19749 ns, Reaching safepoint: 82478 ns, At safepoint: 4740 ns, Total: 87218 ns > [38.574s][1637062062682ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12480 ns, Reaching safepoint: 86707 ns, At safepoint: 5040 ns, Total: 91747 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 904007 ns, Reaching safepoint: 162666 ns, At safepoint: 5160 ns, Total: 167826 ns > [38.575s][1637062062684ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14269 ns, Reaching safepoint: 80878 ns, At safepoint: 5420 ns, Total: 86298 ns > [38.577s][1637062062685ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1240908 ns, Reaching safepoint: 144267 ns, At safepoint: 7030 ns, Total: 151297 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 971325 ns, Reaching safepoint: 175725 ns, At safepoint: 4710 ns, Total: 180435 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 16140 ns, Reaching safepoint: 80258 ns, At safepoint: 5389 ns, Total: 85647 ns > [38.578s][1637062062686ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 10290 ns, Reaching safepoint: 80858 ns, At safepoint: 5530 ns, Total: 86388 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 430509 ns, Reaching safepoint: 159906 ns, At safepoint: 4610 ns, Total: 164516 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18269 ns, Reaching safepoint: 83838 ns, At safepoint: 4520 ns, Total: 88358 ns > [38.579s][1637062062687ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 13270 ns, Reaching safepoint: 77928 ns, At safepoint: 4790 ns, Total: 82718 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 384230 ns, Reaching safepoint: 193705 ns, At safepoint: 4080 ns, Total: 197785 ns > [38.579s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14099 ns, Reaching safepoint: 80908 ns, At safepoint: 4840 ns, Total: 85748 ns > [38.580s][1637062062688ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 9150 ns, Reaching safepoint: 79268 ns, At safepoint: 4890 ns, Total: 84158 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 549396 ns, Reaching safepoint: 143086 ns, At safepoint: 6430 ns, Total: 149516 ns > [38.580s][1637062062689ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12540 ns, Reaching safepoint: 94717 ns, At safepoint: 5800 ns, Total: 100517 ns > [38.581s][1637062062690ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 847758 ns, Reaching safepoint: 146687 ns, At safepoint: 5969 ns, Total: 152656 ns > [38.582s][1637062062691ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 972285 ns, Reaching safepoint: 128177 ns, At safepoint: 6350 ns, Total: 134527 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 986975 ns, Reaching safepoint: 136396 ns, At safepoint: 5770 ns, Total: 142166 ns > [38.584s][1637062062692ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 17280 ns, Reaching safepoint: 87097 ns, At safepoint: 5270 ns, Total: 92367 ns > [38.585s][1637062062693ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1143131 ns, Reaching safepoint: 188315 ns, At safepoint: 5250 ns, Total: 193565 ns > [38.585s][1637062062694ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 12200 ns, Reaching safepoint: 80168 ns, At safepoint: 7480 ns, Total: 87648 ns > [38.586s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1153410 ns, Reaching safepoint: 166846 ns, At safepoint: 7060 ns, Total: 173906 ns > [38.587s][1637062062695ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 21549 ns, Reaching safepoint: 89898 ns, At safepoint: 5360 ns, Total: 95258 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1129411 ns, Reaching safepoint: 156726 ns, At safepoint: 4810 ns, Total: 161536 ns > [38.588s][1637062062696ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14089 ns, Reaching safepoint: 80588 ns, At safepoint: 5170 ns, Total: 85758 ns > [38.589s][1637062062697ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 644824 ns, Reaching safepoint: 140666 ns, At safepoint: 5990 ns, Total: 146656 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1085312 ns, Reaching safepoint: 254264 ns, At safepoint: 5440 ns, Total: 259704 ns > [38.590s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14609 ns, Reaching safepoint: 83748 ns, At safepoint: 5610 ns, Total: 89358 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 387680 ns, Reaching safepoint: 201215 ns, At safepoint: 5340 ns, Total: 206555 ns > [38.591s][1637062062699ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 18929 ns, Reaching safepoint: 85098 ns, At safepoint: 5910 ns, Total: 91008 ns > [38.591s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 380750 ns, Reaching safepoint: 175066 ns, At safepoint: 4730 ns, Total: 179796 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14500 ns, Reaching safepoint: 80577 ns, At safepoint: 6790 ns, Total: 87367 ns > [38.592s][1637062062700ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 14660 ns, Reaching safepoint: 78498 ns, At safepoint: 7180 ns, Total: 85678 ns > [38.592s][1637062062701ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 655783 ns, Reaching safepoint: 141717 ns, At safepoint: 6089 ns, Total: 147806 ns > [38.594s][1637062062702ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 916657 ns, Reaching safepoint: 144226 ns, At safepoint: 5360 ns, Total: 149586 ns > [38.595s][1637062062703ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 1012334 ns, Reaching safepoint: 133037 ns, At safepoint: 10439 ns, Total: 143476 ns > [38.597s][1637062062705ms][info ][gc,phases ] GC(222) Concurrent Process Non-Strong References 39.443ms > > > Example logs from G1 where the Java threads fixes the cleaned inline caches and run out of ICStubs: > > [125.998s][1637065197322ms][info ][gc ] GC(1040) Pause Remark 586M->414M(2048M) 6.609ms > [125.998s][1637065197322ms][info ][gc,cpu ] GC(1040) User=0.08s Sys=0.00s Real=0.01s > [125.998s][1637065197322ms][info ][safepoint ] Safepoint "G1Concurrent", Time since last: 33150646 ns, Reaching safepoint: 103457 ns, At safepoint: 6666988 ns, Total: 6770445 ns > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Mark 38.296ms > [125.998s][1637065197322ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets > [126.001s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2621782 ns, Reaching safepoint: 626684 ns, At safepoint: 9340 ns, Total: 636024 ns > [126.002s][1637065197326ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 19949 ns, Reaching safepoint: 714022 ns, At safepoint: 12160 ns, Total: 726182 ns > [126.007s][1637065197331ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 4665009 ns, Reaching safepoint: 339751 ns, At safepoint: 9640 ns, Total: 349391 ns > [126.009s][1637065197334ms][info ][safepoint ] Safepoint "ICBufferFull", Time since last: 2274802 ns, Reaching safepoint: 365760 ns, At safepoint: 9250 ns, Total: 375010 ns > [126.027s][1637065197352ms][info ][gc,marking ] GC(1040) Concurrent Rebuild Remembered Sets 29.618ms > > > I've tested the performance of the change with SPECjbb2015, SPECjvm2008, DaCapo, Renaissance. > > I've tested run the patch through tier1-7. > > Note that I've made patch as small as possible to make it easier to backport. Thanks @fisk for discussion and explanation of the inline caches code. This pull request has now been integrated. Changeset: 976c2bb0 Author: Stefan Karlsson URL: https://git.openjdk.java.net/jdk/commit/976c2bb05611cdc7b11b0918aaf50ff693507aae Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8277212: GC accidentally cleans valid megamorphic vtable inline caches Reviewed-by: eosterlund, pliden, coleenp, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/6450 From dcubed at openjdk.java.net Fri Nov 19 17:31:52 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 19 Nov 2021 17:31:52 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Marked as reviewed by dcubed (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From dcubed at openjdk.java.net Fri Nov 19 17:32:19 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 19 Nov 2021 17:32:19 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com> References: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com> Message-ID: On Fri, 19 Nov 2021 10:14:23 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1401: >> >>> 1399: if (!self) { >>> 1400: if (!java_thread->is_suspended()) { >>> 1401: _result = JVMTI_ERROR_THREAD_NOT_SUSPENDED; >> >> I don't see an obvious reason for this `is_exiting()` check. > > Okay. I see similar check in the `force_early_return()` function: > > if (state == NULL) { > return JVMTI_ERROR_THREAD_NOT_ALIVE; > } > > Would it better to replace it with this check instead? : > > if (java_thread->is_exiting()) { > return JVMTI_ERROR_THREAD_NOT_ALIVE; > } > > Removing this check and keep the one inside the handshake would be even better. > > I would also add this line for symmetry with two other cases: > > + MutexLocker mu(JvmtiThreadState_lock); > SetForceEarlyReturn op(state, value, tos); My point is that I don't see why you added the `is_exiting()` check since I don't see a race in that function, i.e., there's no `assert()` in this function that you need to protect. As for adding the `MutexLocker mu(JvmtiThreadState_lock)`, you'll have to analyze and justify why you would need to add that lock grab independent of this fix. I'm not seeing a bug there, but I haven't looked very closely. >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1533: >> >>> 1531: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ >>> 1532: } >>> 1533: assert(java_thread == _state->get_thread(), "Must be"); >> >> This `assert()` is the site of the original test failure. I haven't yet >> looked at the locations of the other changes. >> >> The `is_exiting()` check is made under the protection of the >> `JvmtiThreadState_lock` so an unsuspended target thread that is >> exiting cannot reach the point where the `_state` is updated to >> clear the `JavaThread*` so we can't fail the `assert()` if the >> `is_exiting()` check has returned `false`. > > Dan, > Thank you for reviewing this! > I'm not sure, I correctly understand you here. > Are you saying that you agree with this change? > In fact, the thread state can not be changed (and the assert fired) after the `is_exiting()` check is made even without `JvmtiThreadState_lock` protection because it is inside of a handshake execution. I agree with the `is_exiting()` check addition. I forgot that we're executing a Handshake `doit()` function. So we have a couple of reasons why an unsuspended target thread can't change from `!is_exiting()` to `is_exiting()` while we are in this function. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From hseigel at openjdk.java.net Fri Nov 19 17:32:19 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 19 Nov 2021 17:32:19 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore wrote: >> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. >> Tested with mach5 tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. LGTM! Thanks for doing this. Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6466 From coleenp at openjdk.java.net Fri Nov 19 17:32:27 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Nov 2021 17:32:27 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore wrote: >> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. >> Tested with mach5 tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. Thanks Harold and David. I did run some startup and performance tests that might notice JNI (added link to bug for you) and there is no difference in performance. ------------- PR: https://git.openjdk.java.net/jdk/pull/6466 From mdoerr at openjdk.java.net Fri Nov 19 17:34:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 19 Nov 2021 17:34:44 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 20:16:27 GMT, Roman Kennke wrote: > In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. > > Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. > > The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. > > Testing: > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) Looks like a nice change. I only found one problem on Power. src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 2737: > 2735: if (info != NULL) { > 2736: add_debug_info_for_null_check_here(info); > 2737: } I think this is incorrect for AIX. Note that the first page is not read protected on that OS. To make it consistent with other places, I suggest: diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index a772e48f3be..23e03cb36e3 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -2733,7 +2733,11 @@ void LIR_Assembler::emit_load_klass(LIR_OpLoadKlass* op) { CodeEmitInfo* info = op->info(); if (info != NULL) { - add_debug_info_for_null_check_here(info); + if (!os::zero_page_read_protected() || !ImplicitNullChecks) { + explicit_null_check(obj, info); + } else { + add_debug_info_for_null_check_here(info); + } } if (UseCompressedClassPointers) { ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6464 From simonis at openjdk.java.net Fri Nov 19 17:35:38 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 19 Nov 2021 17:35:38 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6] In-Reply-To: References: Message-ID: <94GszC2UOkr7OG_XVItGSF0HOXlK3R_8kQwfxK4sxTE=.e4b0e6bb-928f-47a8-bb50-7263d0683d9a@github.com> On Fri, 19 Nov 2021 14:29:17 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Volker Feedback 2 > - Fix Zhengyu Problem in os::realloc > - Extend gtests > - extend footer to 2 bytes > - Feedback Volker > - Let NMT do overflow detection Looks good now. src/hotspot/share/services/mallocTracker.hpp line 435: > 433: } > 434: > 435: static inline void record_new_arena(MEMFLAGS flags) { Yes, I also wondered why we need these versions so it's good that you could remove them! ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Fri Nov 19 17:48:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Nov 2021 17:48:04 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6] In-Reply-To: <94GszC2UOkr7OG_XVItGSF0HOXlK3R_8kQwfxK4sxTE=.e4b0e6bb-928f-47a8-bb50-7263d0683d9a@github.com> References: <94GszC2UOkr7OG_XVItGSF0HOXlK3R_8kQwfxK4sxTE=.e4b0e6bb-928f-47a8-bb50-7263d0683d9a@github.com> Message-ID: On Fri, 19 Nov 2021 16:30:41 GMT, Volker Simonis wrote: > Looks good now. Many thanks, Volker! Nice to have this issue finally going somewhere. I feared it was stuck in PR limbo till after 18 ships. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From coleenp at openjdk.java.net Fri Nov 19 17:53:37 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Nov 2021 17:53:37 GMT Subject: Integrated: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for In-Reply-To: References: Message-ID: <5kXV0nOOuiIWZTtMqEYo2umU_x6R-xZnJgbF_tv12Uw=.bd70f60a-4230-4634-a665-899403bbc1fb@github.com> On Thu, 18 Nov 2021 21:56:58 GMT, Coleen Phillimore wrote: > Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. > Tested with mach5 tier1-3. This pull request has now been integrated. Changeset: 09e8c8c6 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/09e8c8c64abf4178a042c79b92d7e08e54467331 Stats: 16 lines in 2 files changed: 0 ins; 13 del; 3 mod 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for Reviewed-by: dholmes, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/6466 From coleenp at openjdk.java.net Fri Nov 19 18:20:11 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Nov 2021 18:20:11 GMT Subject: RFR: 8277342: vmTestbase/nsk/stress/strace/strace004.java fails with SIGSEGV in InstanceKlass::jni_id_for [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 02:39:15 GMT, Coleen Phillimore wrote: >> Use the version jni_id_for_impl() as jni_id_for() that takes out the JFieldIdCreation_lock before reading jni_ids in InstanceKlass. >> Tested with mach5 tier1-3. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. I should have pointed out that tier5 also completed with no failures. ------------- PR: https://git.openjdk.java.net/jdk/pull/6466 From rkennke at openjdk.java.net Fri Nov 19 18:22:37 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 19 Nov 2021 18:22:37 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2] In-Reply-To: References: Message-ID: > In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. > > Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. > > The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. > > Testing: > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix null-check on PPC ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6464/files - new: https://git.openjdk.java.net/jdk/pull/6464/files/988036fa..3454c1bf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6464&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6464&range=00-01 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6464.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6464/head:pull/6464 PR: https://git.openjdk.java.net/jdk/pull/6464 From rkennke at openjdk.java.net Fri Nov 19 18:22:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 19 Nov 2021 18:22:41 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 17:25:31 GMT, Martin Doerr wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix null-check on PPC > > src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 2737: > >> 2735: if (info != NULL) { >> 2736: add_debug_info_for_null_check_here(info); >> 2737: } > > I think this is incorrect for AIX. Note that the first page is not read protected on that OS. To make it consistent with other places, I suggest: > > diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > index a772e48f3be..23e03cb36e3 100644 > --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp > @@ -2733,7 +2733,11 @@ void LIR_Assembler::emit_load_klass(LIR_OpLoadKlass* op) { > > CodeEmitInfo* info = op->info(); > if (info != NULL) { > - add_debug_info_for_null_check_here(info); > + if (!os::zero_page_read_protected() || !ImplicitNullChecks) { > + explicit_null_check(obj, info); > + } else { > + add_debug_info_for_null_check_here(info); > + } > } > > if (UseCompressedClassPointers) { Thank you! I pushed a fix for that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6464 From sspitsyn at openjdk.java.net Fri Nov 19 18:25:09 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Fri, 19 Nov 2021 18:25:09 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Dan, thank you for review! ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From sspitsyn at openjdk.java.net Fri Nov 19 18:54:22 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Fri, 19 Nov 2021 18:54:22 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: <0c7TGP1yfAhaJo8PLIkHEvNxZmlqIP-3Lr3tw_dO3wU=.71231bf7-4d0e-48d0-bb77-2e275ef0e652@github.com> On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL David, Thank you for your questions. I'm not sure if all of them are resolved though. :) Please, let me know if it is the case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From mdoerr at openjdk.java.net Fri Nov 19 18:54:22 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 19 Nov 2021 18:54:22 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Still good. Thumbs up from my side. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6440 From smarks at openjdk.java.net Fri Nov 19 20:16:16 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Fri, 19 Nov 2021 20:16:16 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Remove Finalizer.Holder class. Regarding **jcmd** updates, I'm thinking maybe this would be better handled separately. There is the potential to update to `GC.finalizer_info` discussed previously. Looking at the **jcmd** tool docs, it seems like `GC.run_finalization` also ought to be updated. And maybe one or more of the other commands (maybe `VM.flags` or `VM.info`?) ought to list the finalization enabled or disabled status. And of course the tool's doc will need to be updated as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From bchristi at openjdk.java.net Fri Nov 19 20:27:20 2021 From: bchristi at openjdk.java.net (Brent Christian) Date: Fri, 19 Nov 2021 20:27:20 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 00:14:18 GMT, Stuart Marks wrote: >> Pretty much what it says. The new option controls a static member in InstanceKlass that's consulted to determine whether the finalization machinery is activated for instances when a class is loaded. A new native method is added so that this state can be queried from Java. This is used to control whether a finalizer thread is created and to disable the `System` and `Runtime::runFinalization` methods. Includes tests for the above. >> >> Adding an option to disable finalization is part of [JEP 421](https://openjdk.java.net/jeps/421). > > Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: > > Remove Finalizer.Holder class. Lib changes and tests look good ------------- Marked as reviewed by bchristi (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6442 From dholmes at openjdk.java.net Fri Nov 19 22:54:09 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Nov 2021 22:54:09 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: References: Message-ID: <_dbLFHgXpFHVaUbTD5trbAHB01_HF-jA4FGc6kqCmO8=.de90977e-4624-453e-bc30-32acf46cbddc@github.com> On Fri, 19 Nov 2021 20:13:06 GMT, Stuart Marks wrote: >> Stuart Marks has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove Finalizer.Holder class. > > Regarding **jcmd** updates, I'm thinking maybe this would be better handled separately. There is the potential to update to `GC.finalizer_info` discussed previously. Looking at the **jcmd** tool docs, it seems like `GC.run_finalization` also ought to be updated. And maybe one or more of the other commands (maybe `VM.flags` or `VM.info`?) ought to list the finalization enabled or disabled status. And of course the tool's doc will need to be updated as well. @stuart-marks no issue with doing dcmd/jcmd changes separately, but I don't think we need to go too far with this. I had considered `GC.run_finalization` but it just says it calls `System.run_finalization` - so no change needed there as it will be documented in System.runFinalization. And `VM.flags` only reports `-XX` flag information. And `VM.info` doesn't seem appropriate for mentioning this either. So no further changes needed to the other Dcmds IMO and no need to update anything on the jcmd tool page either. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From smarks at openjdk.java.net Sat Nov 20 02:19:09 2021 From: smarks at openjdk.java.net (Stuart Marks) Date: Sat, 20 Nov 2021 02:19:09 GMT Subject: RFR: JDK-8276422 Add command-line option to disable finalization [v4] In-Reply-To: <_dbLFHgXpFHVaUbTD5trbAHB01_HF-jA4FGc6kqCmO8=.de90977e-4624-453e-bc30-32acf46cbddc@github.com> References: <_dbLFHgXpFHVaUbTD5trbAHB01_HF-jA4FGc6kqCmO8=.de90977e-4624-453e-bc30-32acf46cbddc@github.com> Message-ID: <5YHSN8bpKua9XcdiifEVMBT8Zqz9-bTOoDM1DJKp0HI=.f59a79c5-776e-4fd9-82b4-49d37795e0f1@github.com> On Fri, 19 Nov 2021 22:50:49 GMT, David Holmes wrote: >> Regarding **jcmd** updates, I'm thinking maybe this would be better handled separately. There is the potential to update to `GC.finalizer_info` discussed previously. Looking at the **jcmd** tool docs, it seems like `GC.run_finalization` also ought to be updated. And maybe one or more of the other commands (maybe `VM.flags` or `VM.info`?) ought to list the finalization enabled or disabled status. And of course the tool's doc will need to be updated as well. > > @stuart-marks no issue with doing dcmd/jcmd changes separately, but I don't think we need to go too far with this. I had considered `GC.run_finalization` but it just says it calls `System.run_finalization` - so no change needed there as it will be documented in System.runFinalization. And `VM.flags` only reports `-XX` flag information. And `VM.info` doesn't seem appropriate for mentioning this either. So no further changes needed to the other Dcmds IMO and no need to update anything on the jcmd tool page either. @dholmes-ora OK if you're confident that it's sufficient just to add `GC.finalizer_info` and nothing else, and no docs or additional testing, then I'll just drop in the code from that branch you posted. Of course I'll do a full build & test. ------------- PR: https://git.openjdk.java.net/jdk/pull/6442 From lmesnik at openjdk.java.net Sat Nov 20 06:15:30 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Sat, 20 Nov 2021 06:15:30 GMT Subject: RFR: 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 Message-ID: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance. While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. I think it is needed to implement some common way to handle this and cover it in another issue. ------------- Commit messages: - switch to collerctor. - test added. - update - fix Changes: https://git.openjdk.java.net/jdk/pull/6478/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6478&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265795 Stats: 148 lines in 4 files changed: 146 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6478.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6478/head:pull/6478 PR: https://git.openjdk.java.net/jdk/pull/6478 From sspitsyn at openjdk.java.net Sat Nov 20 13:29:08 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Sat, 20 Nov 2021 13:29:08 GMT Subject: RFR: 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> Message-ID: <1xupnqHZy2mpOQLkG92XaND-T6ofJY4UvhZbh1poUng=.b6785d7b-e048-4e53-b0c9-e3cbf742c452@github.com> On Fri, 19 Nov 2021 15:32:24 GMT, Leonid Mesnik wrote: > The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance. > > While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. > > I think it is needed to implement some common way to handle this and cover it in another issue. Hi Leonid, This fix looks good to me. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6478 From lmesnik at openjdk.java.net Sun Nov 21 00:13:09 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Sun, 21 Nov 2021 00:13:09 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From ngasson at openjdk.java.net Mon Nov 22 01:44:10 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 22 Nov 2021 01:44:10 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v6] In-Reply-To: References: Message-ID: <92o4fGbGUNrm39p_vfOJV2cIXw2lzq5CLYPhdlf4hwI=.87d970cc-7f40-4e10-8202-ecc3fd47d8be@github.com> On Tue, 16 Nov 2021 14:23:07 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge master > - Rename pauth_authenticate_or_strip_return_address > - Fix windows aarch64 by restoring pauth file split > - Don't keep LR live across restore_live_registers > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56 LGTM and we did extensive jtreg testing internally (tier1 + hotspot_all, jdk_core). ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6334 From dholmes at openjdk.java.net Mon Nov 22 01:46:09 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 22 Nov 2021 01:46:09 GMT Subject: RFR: 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> Message-ID: On Fri, 19 Nov 2021 15:32:24 GMT, Leonid Mesnik wrote: > The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance. > > While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. > > I think it is needed to implement some common way to handle this and cover it in another issue. Hi Leonid, Functional fix looks good. A couple of minor nits below. I agree that fixing intrinsics should be a separate issue - I have to worry that the overhead of posting events can dwarf the operation itself. I would guess the intrinsic would need a short-cut to check if the event is enabled and if so drop back to non-intrinsic version. Thanks, David test/hotspot/jtreg/serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java line 49: > 47: mh.invoke("str"); > 48: > 49: if(getNumberOfAllocation() != 1) { space after 'if' please test/hotspot/jtreg/serviceability/jvmti/VMObjectAlloc/libVMObjectAlloc.cpp line 91: > 89: } > 90: > 91: } This looks spurious ?? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6478 From ngasson at openjdk.java.net Mon Nov 22 02:03:13 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 22 Nov 2021 02:03:13 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v6] In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 14:23:07 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge master > - Rename pauth_authenticate_or_strip_return_address > - Fix windows aarch64 by restoring pauth file split > - Don't keep LR live across restore_live_registers > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56 We're adding a product option `UseROPProtection` which needs a CSR according to https://wiki.openjdk.java.net/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From dholmes at openjdk.java.net Mon Nov 22 02:08:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 22 Nov 2021 02:08:04 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Hi Serguei, I still feel the bug here can be fixed simply by moving assertions, rather than by introducing a change in behaviour as to what error code would be returned. But I'll leave to serviceability folk to decide. Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From dholmes at openjdk.java.net Mon Nov 22 02:08:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 22 Nov 2021 02:08:04 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com> Message-ID: On Fri, 19 Nov 2021 17:04:42 GMT, Daniel D. Daugherty wrote: >> Okay. I see similar check in the `force_early_return()` function: >> >> if (state == NULL) { >> return JVMTI_ERROR_THREAD_NOT_ALIVE; >> } >> >> Would it better to replace it with this check instead? : >> >> if (java_thread->is_exiting()) { >> return JVMTI_ERROR_THREAD_NOT_ALIVE; >> } >> >> Removing this check and keep the one inside the handshake would be even better. >> >> I would also add this line for symmetry with two other cases: >> >> + MutexLocker mu(JvmtiThreadState_lock); >> SetForceEarlyReturn op(state, value, tos); > > My point is that I don't see why you added the `is_exiting()` check > since I don't see a race in that function, i.e., there's no `assert()` in > this function that you need to protect. > > As for adding the `MutexLocker mu(JvmtiThreadState_lock)`, you'll > have to analyze and justify why you would need to add that lock grab > independent of this fix. I'm not seeing a bug there, but I haven't looked > very closely. The `is_exiting` check changes the behaviour from reporting JVMTI_ERROR_THREAD_NOT_SUSPENDED to JVMTI_ERROR_THREAD_NOT_ALIVE. Arguably it is a more precise answer, but it is somewhat splitting hairs. To me it might be clearer to the developer what their logic error is if they get NOT_SUSPENDED rather than NOT_ALIVE. Either way this change is not needed to fix any known bug and the change is behaviour seems questionable. >> Dan, >> Thank you for reviewing this! >> I'm not sure, I correctly understand you here. >> Are you saying that you agree with this change? >> In fact, the thread state can not be changed (and the assert fired) after the `is_exiting()` check is made even without `JvmtiThreadState_lock` protection because it is inside of a handshake execution. > > I agree with the `is_exiting()` check addition. > > I forgot that we're executing a Handshake `doit()` function. So we have a couple > of reasons why an unsuspended target thread can't change from `!is_exiting()` > to `is_exiting()` while we are in this function. Again this introduces a more precise state check but also changes the behaviour by now reporting NOT_ALIVE instead of NOT_SUSPENDED. The assertion failure can be fixed by simply moving the assertion to after the suspension check. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From dholmes at openjdk.java.net Mon Nov 22 02:08:05 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 22 Nov 2021 02:08:05 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com> References: <2l9gjieNV6K8UMLcGHO_CtSWzzN5Kv45pFt6_3OZ85o=.1ae38c09-bf00-45ec-ac96-838469a5f7a7@github.com> Message-ID: <9_cJhg6lSbDRvLkCZAYYpWwKYWi5gBefhTeGyvOtHGw=.8783abed-f787-4a74-85e2-da8659f9edca@github.com> On Fri, 19 Nov 2021 10:15:05 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1625: >> >>> 1623: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ >>> 1624: } >>> 1625: assert(_state->get_thread() == java_thread, "Must be"); >> >> The `assert()` on L1625 is subject to the same race as the original site. >> This `is_exiting()` check is made under the protection of the >> `JvmtiThreadState_lock` so it is sufficient to protect that `assert()`. > > Okay, thanks! Same comment as above. ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From lmesnik at openjdk.java.net Mon Nov 22 04:23:00 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Mon, 22 Nov 2021 04:23:00 GMT Subject: RFR: 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 [v2] In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> Message-ID: > The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance. > > While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. > > I think it is needed to implement some common way to handle this and cover it in another issue. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6478/files - new: https://git.openjdk.java.net/jdk/pull/6478/files/e160dbe3..b37ee052 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6478&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6478&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6478.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6478/head:pull/6478 PR: https://git.openjdk.java.net/jdk/pull/6478 From lmesnik at openjdk.java.net Mon Nov 22 04:23:01 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Mon, 22 Nov 2021 04:23:01 GMT Subject: RFR: 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 [v2] In-Reply-To: References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> Message-ID: On Mon, 22 Nov 2021 01:38:47 GMT, David Holmes wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed > > test/hotspot/jtreg/serviceability/jvmti/VMObjectAlloc/VMObjectAllocTest.java line 49: > >> 47: mh.invoke("str"); >> 48: >> 49: if(getNumberOfAllocation() != 1) { > > space after 'if' please fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/6478 From shade at openjdk.java.net Mon Nov 22 09:21:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 22 Nov 2021 09:21:41 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v4] In-Reply-To: References: Message-ID: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Fix a comment - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace - More reviews - Review feedback - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace - Initial work: runs async-profiler successfully ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5848/files - new: https://git.openjdk.java.net/jdk/pull/5848/files/68ef4b63..bc4ba33b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=02-03 Stats: 44745 lines in 800 files changed: 32663 ins; 5661 del; 6421 mod Patch: https://git.openjdk.java.net/jdk/pull/5848.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848 PR: https://git.openjdk.java.net/jdk/pull/5848 From sspitsyn at openjdk.java.net Mon Nov 22 09:26:03 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 22 Nov 2021 09:26:03 GMT Subject: RFR: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" [v3] In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 09:34:13 GMT, Serguei Spitsyn wrote: >> The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. >> The fix is to add a check for is_exiting() status into handshake closure do_thread() early. >> There following handshake closures are fixed by this update: >> - UpdateForPopTopFrameClosure >> - SetForceEarlyReturn >> - SetFramePopClosure > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > get rid of the checks in jvmti handshakes: java_thread->threadObj() == NULL Hi David, Thank you for looking at this and your comments. Exiting thread should not be in suspended state. Also, I'm pretty sure that the THREAD_NOT_ALIVE error code should normally take priority. So, I prefer current fix over moving the assert. But I kind of understand you concern. Thank you for sharing it! Thanks, Serguei ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From ngasson at openjdk.java.net Mon Nov 22 10:42:13 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 22 Nov 2021 10:42:13 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 18:22:37 GMT, Roman Kennke wrote: >> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. >> >> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. >> >> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. >> >> Testing: >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null-check on PPC I tested tier1 on 32-bit Arm and AArch64. 32-bit Arm had some failures but they don't seem to be related to this patch. src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp line 2453: > 2451: } > 2452: > 2453: if (UseCompressedClassPointers) { // On 32 bit arm?? It's probably leftover from when the "arm" port supported both 32- and 64-bit. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6464 From sspitsyn at openjdk.java.net Mon Nov 22 10:51:13 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 22 Nov 2021 10:51:13 GMT Subject: Integrated: 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" In-Reply-To: References: Message-ID: On Wed, 17 Nov 2021 22:21:33 GMT, Serguei Spitsyn wrote: > The test fails when the target JavaThread has is_exiting() status. In such a case the JvmtiExport::cleanup_thread(this) has already made a clean up of its jvmtiThreadState, so the JavaThread address returned by _state->get_thread() is 0xbabababababababa. > The fix is to add a check for is_exiting() status into handshake closure do_thread() early. > There following handshake closures are fixed by this update: > - UpdateForPopTopFrameClosure > - SetForceEarlyReturn > - SetFramePopClosure This pull request has now been integrated. Changeset: 32839ba0 Author: Serguei Spitsyn URL: https://git.openjdk.java.net/jdk/commit/32839ba012f0a0a66e249cd8d12b94499d82ec0a Stats: 22 lines in 2 files changed: 10 ins; 6 del; 6 mod 8266593: vmTestbase/nsk/jvmti/PopFrame/popframe011 fails with "assert(java_thread == _state->get_thread()) failed: Must be" Reviewed-by: mdoerr, lmesnik, dcubed ------------- PR: https://git.openjdk.java.net/jdk/pull/6440 From aph at openjdk.java.net Mon Nov 22 10:58:14 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 22 Nov 2021 10:58:14 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 18:22:37 GMT, Roman Kennke wrote: >> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. >> >> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. >> >> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. >> >> Testing: >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null-check on PPC Thanks, a very welcome fix. I wish I had done something like this at the time of the AArch64 port, but I was neither brave enough nor knew enough src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 991: > 989: // FIXME: OMG this is a horrible kludge. Any offset from an > 990: // address that matches klass_offset_in_bytes() will be loaded > 991: // as a word, not a long. Ha! I am so glad to see this horrible kludge removed. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6464 From mcimadamore at openjdk.java.net Mon Nov 22 12:02:47 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 22 Nov 2021 12:02:47 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v25] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix javadoc issues found in CSR review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5907/files - new: https://git.openjdk.java.net/jdk/pull/5907/files/79d3d685..1817975f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=24 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=23-24 Stats: 10 lines in 4 files changed: 2 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From mcimadamore at openjdk.java.net Mon Nov 22 12:09:30 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 22 Nov 2021 12:09:30 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v26] In-Reply-To: References: Message-ID: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Merge branch 'master' into JEP-419 - Fix javadoc issues found in CSR review - Adopt blessed modofier order - Merge branch 'master' into JEP-419 - Revert removal of upcall MH customization (This change caused spurious VM crashes, so reverting to baseline) - Further tweak upcall safety considerations - Clarify safety considerations for upcalls - Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress (which is consistent with other restricted factories in VaList and NativeSymbol) - Streamline javadoc for package-info - * Add two new CLinker static methods to compute upcall/downcall method types * Clarify section on CLinker downcall type * Add section on CLinker safety guarantees - ... and 25 more: https://git.openjdk.java.net/jdk/compare/d427c79d...29cc6c60 ------------- Changes: https://git.openjdk.java.net/jdk/pull/5907/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5907&range=25 Stats: 14700 lines in 193 files changed: 6958 ins; 5126 del; 2616 mod Patch: https://git.openjdk.java.net/jdk/pull/5907.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5907/head:pull/5907 PR: https://git.openjdk.java.net/jdk/pull/5907 From mdoerr at openjdk.java.net Mon Nov 22 12:44:10 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 22 Nov 2021 12:44:10 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 18:22:37 GMT, Roman Kennke wrote: >> In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. >> >> Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. >> >> The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. >> >> Testing: >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null-check on PPC Nice change! Please remove the duplicated `info != NULL` check before integrating. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6464 From mdoerr at openjdk.java.net Mon Nov 22 12:44:11 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 22 Nov 2021 12:44:11 GMT Subject: RFR: 8277417: C1 LIR instruction for load-klass [v2] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 18:18:28 GMT, Roman Kennke wrote: >> src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp line 2737: >> >>> 2735: if (info != NULL) { >>> 2736: add_debug_info_for_null_check_here(info); >>> 2737: } >> >> I think this is incorrect for AIX. Note that the first page is not read protected on that OS. To make it consistent with other places, I suggest: >> >> diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp >> index a772e48f3be..23e03cb36e3 100644 >> --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp >> +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp >> @@ -2733,7 +2733,11 @@ void LIR_Assembler::emit_load_klass(LIR_OpLoadKlass* op) { >> >> CodeEmitInfo* info = op->info(); >> if (info != NULL) { >> - add_debug_info_for_null_check_here(info); >> + if (!os::zero_page_read_protected() || !ImplicitNullChecks) { >> + explicit_null_check(obj, info); >> + } else { >> + add_debug_info_for_null_check_here(info); >> + } >> } >> >> if (UseCompressedClassPointers) { > > Thank you! I pushed a fix for that. Unfortunately, we have the `info != NULL` check twice, now. Otherwise, good. ------------- PR: https://git.openjdk.java.net/jdk/pull/6464 From zgu at openjdk.java.net Mon Nov 22 13:44:07 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 22 Nov 2021 13:44:07 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6] In-Reply-To: References: Message-ID: On Fri, 19 Nov 2021 14:29:17 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Volker Feedback 2 > - Fix Zhengyu Problem in os::realloc > - Extend gtests > - extend footer to 2 bytes > - Feedback Volker > - Let NMT do overflow detection Marked as reviewed by zgu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From eosterlund at openjdk.java.net Mon Nov 22 14:03:34 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 22 Nov 2021 14:03:34 GMT Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper Message-ID: The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used. ------------- Commit messages: - 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper Changes: https://git.openjdk.java.net/jdk/pull/6501/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6501&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276696 Stats: 70 lines in 15 files changed: 35 ins; 11 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/6501.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6501/head:pull/6501 PR: https://git.openjdk.java.net/jdk/pull/6501 From lmesnik at openjdk.java.net Mon Nov 22 17:14:29 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Mon, 22 Nov 2021 17:14:29 GMT Subject: Integrated: 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 In-Reply-To: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> References: <_uX5GXu4fEzqseTEdOTcEH0HKiwQ8jeccft8kd5_Hcg=.3c1b3d8b-63f0-4d1b-a238-a10e86dd012c@github.com> Message-ID: On Fri, 19 Nov 2021 15:32:24 GMT, Leonid Mesnik wrote: > The VMObjectAlloc jvmti event was not generated for objects created using MethodHanldle. The fix adds posting of the event into Unsafe_AllocateInstance. > > While fixing this bug I noticed that event is not posted in the intrinsics version for many functions where it is used. Including but not limited to clone(), invoke()m allocateInstance() and allocateUninitializedArray(). There are might be other intensified functions (not analogs JVM_ENTRY versions) that allocate objects without post events. > > I think it is needed to implement some common way to handle this and cover it in another issue. This pull request has now been integrated. Changeset: 33e2a518 Author: Leonid Mesnik URL: https://git.openjdk.java.net/jdk/commit/33e2a518ebcd50e76c559512539fd7c864fd2407 Stats: 148 lines in 4 files changed: 146 ins; 2 del; 0 mod 8265795: vmTestbase/nsk/jvmti/AttachOnDemand/attach022/TestDescription.java fails when running with JEP 416 Reviewed-by: sspitsyn, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6478 From duke at openjdk.java.net Mon Nov 22 17:35:41 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 22 Nov 2021 17:35:41 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v7] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge master - Merge master - Rename pauth_authenticate_or_strip_return_address - Fix windows aarch64 by restoring pauth file split - Don't keep LR live across restore_live_registers - Merge master - Document pauth functions && remove OS split - Update UseROPProtection description - Simplify branch protection configure check - 8264130: PAC-RET protection for Linux/AArch64 PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One of its uses is to protect against ROP based attacks. This is done by signing the Link Register whenever it is stored on the stack, and authenticating the value when it is loaded back from the stack. If an attacker were to try to change control flow by editing the stack then the authentication check of the Link Register will fail, causing a segfault when the function returns. On a system with PAC enabled, it is expected that all applications will be compiled with ROP protection. Fedora 33 and upwards already provide this. By compiling for ARMv8.0, GCC and LLVM will only use the set of PAC instructions that exist in the NOP space - on hardware without PAC, these instructions act as NOPs, allowing backward compatibility for negligible performance cost (2 NOPs per non-leaf function). Hardware is currently limited to the Apple M1 MacBooks. All testing has been done within a Fedora Docker image. A run of SpecJVM showed no difference to that of noise - which was surprising. The most important part of this patch is simply compiling using branch protection provided by GCC/LLVM. This protects all C++ code from being used in ROP attacks, removing all static ROP gadgets from use. The remainder of the patch adds ROP protection to runtime generated code, in both stubs and compiled Java code. Attacks here are much harder as ROP gadgets must be found dynamically at runtime. If/when AOT compilation is added to JDK, then all stubs and compiled Java will be susceptible ROP gadgets being found by static analysis and therefore potentially as vulnerable as C++ code. There are a number of places where the VM changes control flow by rewriting the stack or otherwise. I?ve done some analysis as to how these could also be used for attacks (which I didn?t want to post here). These areas can be protected ensuring the pointers to various stubs and entry points are stored in memory as signed pointers. These changes are simple to make (they can be reduced to a type change in common code and a few addition sign/auth calls in the backend), but there a lot of them and the total code change is fairly large. I?m happy to provide a few work in progress patches. In order to match the security benefits of the Apple Arm64e ABI across the whole of JDK, then all the changes mentioned above would be required. - ... and 3 more: https://git.openjdk.java.net/jdk/compare/ca31ed53...280abc41 ------------- Changes: https://git.openjdk.java.net/jdk/pull/6334/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=06 Stats: 1381 lines in 25 files changed: 517 ins; 18 del; 846 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Nov 22 17:35:45 2021 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 22 Nov 2021 17:35:45 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v6] In-Reply-To: References: Message-ID: On Tue, 16 Nov 2021 14:23:07 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge master > - Rename pauth_authenticate_or_strip_return_address > - Fix windows aarch64 by restoring pauth file split > - Don't keep LR live across restore_live_registers > - Merge master > - Document pauth functions && remove OS split > - Update UseROPProtection description > - Simplify branch protection configure check > - 8264130: PAC-RET protection for Linux/AArch64 > > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. > - Add PAC assembly instructions > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/b8d33a2a...deb17a56 CSR added: https://bugs.openjdk.java.net/browse/JDK-8277543 ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From jorn.vernee at oracle.com Mon Nov 22 19:19:31 2021 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Mon, 22 Nov 2021 20:19:31 +0100 Subject: Questions about oop handling for Panama upcalls. In-Reply-To: <89b42995-1504-d3cc-1d37-595610b75801@oracle.com> References: <700ffdf2-f63d-7d91-828a-d41e9aa433e5@oracle.com> <9e2fa731-ff0c-3497-eda0-2ca394a1f33b@oracle.com> <89b42995-1504-d3cc-1d37-595610b75801@oracle.com> Message-ID: One more comment on this thread for future readers: As mentioned before, I had noticed that deoptimization code would reconstitute the receiver oop in the upcall stub's frame, so I added an extra stack word for that in the upcall frame (thinking at that time that the caller was supposed to make room for the receiver on the stack). But upon recent inspection of c2i adapter code, I noticed that the c2i adapter should already be making room for the receiver as well, so there should theoretically be no need for those extra stack words. It turns out that the deopt code will not recreate the c2i adapter when doing a deopt. For that to work for compiled callers, the stack needs to be adjusted to make room for the parameters (as the 2ci adapter does), as well as extra locals. The space that is needed is calculated in Deoptimization::fetch_unroll_info_helper by the following code: ? // Compute the amount the oldest interpreter frame will have to adjust ? // its caller's stack by. If the caller is a compiled frame then ? // we pretend that the callee has no parameters so that the ? // extension counts for the full amount of locals and not just ? // locals-parms. This is because without a c2i adapter the parm ? // area as created by the compiled frame will not be usable by ? // the interpreter. (Depending on the calling convention there ? // may not even be enough space). ? // QQQ I'd rather see this pushed down into last_frame_adjust ? // and have it take the sender (aka caller). ? if (deopt_sender.is_compiled_frame() || caller_was_method_handle) { ??? caller_adjustment = last_frame_adjust(0, callee_locals); ? } else if (callee_locals > callee_parameters) { ??? // The caller frame may need extending to accommodate ??? // non-parameter locals of the first unpacked interpreted frame. ??? // Compute that adjustment. ??? caller_adjustment = last_frame_adjust(callee_parameters, callee_locals); ? } I think you can probably spot the problem from this: we are doing a compiled call in the upcall stub, but the if-statement is not catching that case, so we don't make enough space on the stack. (in the case of method handles a pessimization seems to be used, since it's not known how much room the caller has on the stack for the parameters). Jorn On 17/11/2021 16:35, Jorn Vernee wrote: > On 17/11/2021 16:14, Erik Osterlund wrote: >> Hi Jorn, >> >> In the interpreter world, the expression stack at the call site >> becomes the locals >> of the callee. So everything is passed through the stack. So the >> upcall stub sets >> things up like an interpreter method would have (quack quack), and >> calls the >> i2c adapter if there is an nmethod (quack quack), which will >> transform the >> arguments to the compiled convention of the callee. The argument >> ownership >> then switches from the caller to the callee, once the callee can >> manifest on the >> stack. But if there are safepoints inbetween, then the caller owns >> the arguments >> until its callee manifests. > Okay, thanks, that makes sense. This probably explains why not > implementing preserve_callee_argument_oops for the upcall stubs didn't > cause any problems so far. There probably just weren't any safepoints > in between the call from the stub and the callee setting up it's > frame. (although I'm still a bit confused here why the callee doesn't > make space for the receiver in it's frame as well). >> Do you want to avoid the pretend to be the interpreter step because >> it is costly >> in the Panama world to spill arguments to the stack? > I think either one could "work", although it seems like interpreter > calls require more setup of meta data around calls (which would be > unneeded if we called into an nmethod I think?). Also, we generate an > argument shuffle from the native convention to the Java calling > convention (this is unavoidable). If the native convention passes > arguments in the same registers that the Java convention expects them > in we don't have to generate code for that in the shuffle. > Theoretically we could also do a pass to minimize the needed shuffle > by reordering parameters on the MethodHandle. If we went with an > interpreted calling convention, we would always have to copy across > arguments to the stack, in a shuffle-ish manner (right now we rely on > SharedRuntime::java_calling_convention to compute the target > registers. Would have to implement something similar for the > interpreter convention). > > It seems to me that in the long run, going with the Java compiled > calling convention for the upcall is the right choice if we want to be > able to squeeze out as much speed as possible. > > Jorn >> >> /Erik >> >>> -----Original Message----- >>> From: Jorn Vernee >>> Sent: Wednesday, 17 November 2021 15:49 >>> To: Erik Osterlund ; hotspot- >>> dev at openjdk.java.net >>> Subject: Re: Questions about oop handling for Panama upcalls. >>> >>> Hi Erik, >>> >>> Thanks for the suggestion. >>> >>> The callee is a mix of JDK internal and user code. The user gives us >>> a method >>> handle that they want to turn into a native function pointer [1], >>> and we adapt >>> that using method handle combinators [2] to take only primitve >>> arguments >>> according to the registers in which the native calling convention >>> passes >>> arguments (essentially each primitive argument is a register value). >>> The >>> register values are then reconstructed into high-level arguments >>> (through >>> our MH adaptation), and passed to the user code. It's this adapted >>> method >>> handle that we call from the upcall stub. >>> >>> I guess what you're suggesting is that we have some internal Java >>> method >>> like this: >>> >>> ? ??? static ... invoke(long methodHandle, ...) { >>> ? ??????? MethodHandle mh = resolveJObject(methodHandle); >>> ? ??????? return (...) mh.invokeExact(...); >>> ? ??? } >>> >>> Which is then called from the upcall stub instead. >>> >>> I think it could work maybe (would have to see how the performance >>> works >>> out), but we have to deal with different signatures, so would have >>> to use >>> bytecode spinning to generate these 'invoke' methods on demand, which >>> seems like maybe it's a worse medicine (in terms of complexity) than >>> adding >>> the correct oop handling in the VM. >>> >>> I would also just like to get a better understanding of how this is >>> supposed to >>> work in the first place (or how it works e.g. in the case of >>> nmethods), since I >>> had to implement the correct oop handling in the past as well when >>> implementing the intrinsics for down calls, and it's probably not >>> the last time I >>> have to deal with something like this... >>> >>> ? > Our current upcall stubs try to quack like an interpreter in >>> many ways, so >>> that it will look like an i-2-something call. I think you can either >>> try to do the >>> same quacking dance, to pass the oop to the callee >>> >>> So, I suppose interpreter argument oops are handled through another >>> mechanism than OopMaps, maybe something similar to >>> CompiledMethod::preserve_callee_argument_oops? >>> >>> Thanks, >>> Jorn >>> >>> [1] : >>> https://github.com/openjdk/panama-foreign/blob/foreign- >>> jextract/src/jdk.incubator.foreign/share/classes/jdk/incubator/foreign/CLink >>> >>> er.java#L224 >>> [2] : >>> https://github.com/openjdk/panama-foreign/blob/foreign- >>> jextract/src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/abi/Pr >>> >>> ogrammableUpcallHandler.java#L157 >>> >>> On 17/11/2021 10:42, Erik Osterlund wrote: >>>> Hi Jorn, >>>> >>>> So you have a jobject in the caller, resolve it, and then need to >>>> pass the >>> oop around as an argument to the callee. Our current upcall stubs >>> try to >>> quack like an interpreter in many ways, so that it will look like an >>> i-2- >>> something call. I think you can either try to do the same quacking >>> dance, to >>> pass the oop to the callee, or alternatively the primary question >>> for me >>> seems to be who is the callee? You have a very fixed format for the >>> call, >>> which makes me suspect the callee is some kind of JDK internal code. >>> Another way of dealing with this would be to pass the jobject as a >>> long and >>> just resolve it in the callee instead, if this is indeed JDK >>> internal code. Then >>> this becomes a problem that doesn't need to be solved at all. Just >>> sanity >>> checking. >>>> /Erik >>>> >>>>> -----Original Message----- >>>>> From: hotspot-dev? On Behalf Of >>>>> Jorn Vernee >>>>> Sent: Tuesday, 16 November 2021 18:51 To:hotspot- >>> dev at openjdk.java.net >>>>> Subject: Questions about oop handling for Panama upcalls. >>>>> >>>>> Hi, >>>>> >>>>> For panama-foreign upcalls we spin our own upcall stubs that wrap a >>>>> method handle VM entry for the actual upcall. I want to make sure I >>>>> have the oop handling correct on this. >>>>> >>>>> We receive a list of arguments from native code (all primitives, so >>>>> no oops to handle there), and then prefix that list with a >>>>> MethodHandle oop, before calling into the MH's VM entry. The MH oop >>>>> can be stored in three different >>>>> places: >>>>> >>>>> 1. The MH oop is stored in a global JNI handle, and then resolved >>>>> right before the upcall [1]. >>>>> 2. The MH oop is then stored in the first argument register j_rarg0 >>>>> for the call. >>>>> 3. During a deopt of the callee, the deoptimization code spills the >>>>> receiver (MH oop) into the frame of the upcall stub. (looks like the >>>>> extending of the frame that happens for instance in c2i adapters >>>>> doesn't make room for the receiver?). >>>>> >>>>> I don't think I need to do anything else for 1., but for 2. and 3. >>>>> there is currently no handling. I wanted to ask how those cases >>>>> should be handled, if at all. >>>>> >>>>> I think 2. could in theory be addressed by implementing >>>>> CodeBlob::preserve_callee_argument_oops. Though, it has been working >>>>> fine so far without this, so I'm wondering if this is even needed. Is >>>>> the caller or callee responsible for handling argument oops (seems to >>>>> be caller, from looking at >>> CompiledMethod::preserve_callee_argument_oops)? >>>>> Or does the caller just handle the receiver if there is one (since >>>>> deopt spills that into the callers frame)? The oop offset is passed >>>>> to an OopClosure in CompiledArgumentOopFinder::handle_oop_offset as >>>>> an oop* [2]. Does the argument register get spilled somewhere and the >>>>> oop needs to be patched in place at that address (by the OopClosure)? >>>>> Or is this just used to mark the oop as alive? (in the latter >>>>> case, the JNI >>> global should be enough I think). >>>>> I think 3. could be handled with an OopMap entry at the frame offset >>>>> where the receiver is spilled during a deopt of the callee? Should it >>>>> be an oop or a narrowOop, or does it depend on VM settings? FWIW, the >>>>> deopt code always seems to need a machine word (64-bits) to do the >>>>> spilling, so I think it's an oop? Do I need to zero out that part of >>>>> the frame when allocating the frame so that the GC doesn't mistake >>>>> some garbage that's in there for an oop? >>>>> >>>>> I have a POC patch here for reference [3], that implements the 2 >>>>> things above. This passes our test suite, but I'm not sure about the >>> correctness. >>>>> Looking at what JNI does for upcalls [4], I don't see how e.g. the >>>>> receiver argument that is put on the stack is handled, or what >>>>> happens when the callee deopts (though I think it would just >>>>> overwrite the value on the stack that's there already, since JNI >>>>> always seems to do interpreted calls, where we do compiled calls). >>>>> But, JNI/the call stub might be special cased elsewhere... >>>>> >>>>> Also, the oop is briefly stored in rscratch1 when resolving. I'm >>>>> interested to know when the GC can look at the frame and register >>>>> state, especially with concurrent GCs in mind. I'm assuming it's only >>>>> during the call to the MH VM entry (but the existence of >>> frame::safe_for_sender makes me less sure)? >>>>> AFAIK the call counts as a safepoint (with oop map for it typically >>>>> stored at the return offset). At this safepoint, the oop can only be >>>>> stored at one of the >>>>> 3 places listed at the start. >>>>> >>>>> Thanks, >>>>> Jorn >>>>> >>>>> [1] : >>>>> https://github.com/openjdk/panama-foreign/blob/foreign- >>>>> jextract/src/hotspot/cpu/x86/universalUpcallHandler_x86_64.cpp#L412-L >>>>> 416 >>>>> [2] : >>>>> >>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/ >>>>> fr >>>>> ame.cpp#L939-L946 >>>>> [3] : >>>>> https://github.com/openjdk/panama-foreign/compare/foreign- >>>>> memaccess+abi...JornVernee:Deopt_Crash >>>>> [4] : >>>>> >>> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGe >>>>> nerator_x86_64.cpp#L339 From darcy at openjdk.java.net Mon Nov 22 21:55:21 2021 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 22 Nov 2021 21:55:21 GMT Subject: RFR: 8275063: Implementation of Foreign Function & Memory API (Second incubator) [v26] In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 12:09:30 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/419 > > Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: > > - Merge branch 'master' into JEP-419 > - Fix javadoc issues found in CSR review > - Adopt blessed modofier order > - Merge branch 'master' into JEP-419 > - Revert removal of upcall MH customization > (This change caused spurious VM crashes, so reverting to baseline) > - Further tweak upcall safety considerations > - Clarify safety considerations for upcalls > - Rename MemorySegment::ofAddressNative to MemorySegment::ofAddress > (which is consistent with other restricted factories in VaList and NativeSymbol) > - Streamline javadoc for package-info > - * Add two new CLinker static methods to compute upcall/downcall method types > * Clarify section on CLinker downcall type > * Add section on CLinker safety guarantees > - ... and 25 more: https://git.openjdk.java.net/jdk/compare/d427c79d...29cc6c60 Marked as reviewed by darcy (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From sviswanathan at openjdk.java.net Tue Nov 23 01:35:28 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 01:35:28 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 Message-ID: Currently 32-byte instructions are used for small array copy and clear. This can be optimized by using 64-byte instructions. Please review. Best Regards, Sandhya ------------- Commit messages: - 8277617: Optimize array copy and clear on x86_64 Changes: https://git.openjdk.java.net/jdk/pull/6512/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277617 Stats: 15 lines in 4 files changed: 2 ins; 0 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From dholmes at openjdk.java.net Tue Nov 23 02:18:10 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 23 Nov 2021 02:18:10 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 01:23:04 GMT, Sandhya Viswanathan wrote: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya This isn't my area but I'm a bit perplexed by the changes. AFAICS this patch does 2 things: 1. It changes all use of `AVX3Threshold` to `VM_Version::avx3_threshold()` 2. It defines `VM_Version::avx3_threshold()` as: ```static int avx3_threshold() { return (supports_serialize() ? 0: AVX3Threshold); }``` but I am at a loss to understand what `supports_serialize()` has to do with using 64-byte instructions for array copy and clear. ?? Thanks, David Plus some performance numbers would be useful. Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 23 02:48:09 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 02:48:09 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 02:14:46 GMT, David Holmes wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Plus some performance numbers would be useful. Thanks @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From dholmes at openjdk.java.net Tue Nov 23 02:58:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 23 Nov 2021 02:58:07 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 01:23:04 GMT, Sandhya Viswanathan wrote: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya But what exactly is it that you are checking for? What is the connection between the ISA version and the decision to effectively zero out AVX3Threshold? ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 23 04:28:07 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 04:28:07 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 02:54:51 GMT, David Holmes wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > But what exactly is it that you are checking for? What is the connection between the ISA version and the decision to effectively zero out AVX3Threshold? @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit. If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From dholmes at openjdk.java.net Tue Nov 23 04:54:06 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 23 Nov 2021 04:54:06 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 04:25:23 GMT, Sandhya Viswanathan wrote: >> But what exactly is it that you are checking for? What is the connection between the ISA version and the decision to effectively zero out AVX3Threshold? > > @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit. > If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method. @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 23 05:21:45 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 05:21:45 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v2] In-Reply-To: References: Message-ID: <8DCW_z8u24RWqc6LhKRd6jXF8gYOF9rvY-AMtz4C2Is=.a983c0a5-a714-4a88-8cd8-dba8d65ac72a@github.com> > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: restrict to Intel core and add comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/54aa9cee..e0cb890d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 23 05:28:06 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 05:28:06 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 04:50:42 GMT, David Holmes wrote: >> @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit. >> If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method. > > @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks. @dholmes-ora I have implemented your review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From jiefu at openjdk.java.net Tue Nov 23 06:09:05 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 23 Nov 2021 06:09:05 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 04:50:42 GMT, David Holmes wrote: >> @dholmes-ora The Intel platforms that supports this ISA has improved implementation of 64-byte load/stores. I could not find any other better way to check in the absence of cupid bit. >> If it helps, I could further restrict it to (is_intel_family_core() && supports_serialize()). Also, I can add a comment towards this to the avx3_threshold() method. > > @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks. > @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform. It would be better to add a jmh test for this opt. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From stuefe at openjdk.java.net Tue Nov 23 06:48:14 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 23 Nov 2021 06:48:14 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks In-Reply-To: <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com> References: <_r5qw_r-3Be7zUJuf4gcb10MFe9varAWAvix_CaJiYs=.758ad563-f2d2-466d-bcb5-1ccc6b547e94@github.com> <6HZO-x_TuA4XOmbGfIo7DCXFgZCSNU69FkpWT8WWtL8=.6c3180cf-ced1-43d3-966f-3f21e9d3bffe@github.com> Message-ID: On Sun, 17 Oct 2021 13:30:17 GMT, Zhengyu Gu wrote: >>> > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one? >>> > > >>> > > >>> > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301: >>> > > Disadvantages of the current solution: >>> > > >>> > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs. >>> > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds. >>> > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes. >>> > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller. >>> > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build). >>> > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful. >>> > > >>> > > Thanks, Thomas >>> > >>> > >>> > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole. >>> >>> Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right? >> >> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw). >> >> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. >> >> Cheers, Thomas > >> > > > > Sorry, we already have GuardedMemory for detecting buffer overrun, why introduce a new one? >> > > > >> > > > >> > > > GuardedMemory has a number of disadvantages, and I'd like to remove it in favor of NMT doing buffer overrun checks. For my full reasoning, please see my reasoning in the umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301: >> > > > Disadvantages of the current solution: >> > > > >> > > > * We have no way to do C-heap checking in release builds. But there, it is sorely missed. We ship release VMs, and those VMs get used in myriad ways with a lot of faulty third-party native code. I would love to be able to flip a product switch at a customer site and have some basic C-heap checks done, without relegating to external tools or debug c-libs. >> > > > * The debug-only guards in os::malloc() are quite fat, really, a whopping 48 bytes per allocation on 64-bit, 40 bytes on 32-bit. That is for guarding alone. They distort the memory allocation picture, since blowing up every allocation this way causes the underlying libc to do different things. Therefore we have different memory layouts and allocation patterns between debug and release. In addition, we have different code paths too, e.g. in debug os::realloc calls os::malloc + os::free whereas in release builds it calls directly into libc ::realloc. All that means that in debug builds we test something different than what we ship in release builds. >> > > > * The canary in the headers of the debug-only guards do not directly precede the user portion of the data, so we won't catch negative buffer overflows of only a few bytes. >> > > > * The guarding added by CheckJNICalls is unnecessarily expensive too, since it copies the memory around, handing a copy of the guarded memory up to the caller. >> > > > * The fact that three different code sections all do malloc headers incurs unnecessary costs, and the code is unnecessarily complex. It makes also statistics difficult to understand since the silent overhead can be large (compare the rise in RSS with the rise in NMT allocations in a debug build). >> > > > * None of the current overflow checkers print out hex dumps of the violated memory. That is what the libc usually does and it is very useful. >> > > > >> > > > Thanks, Thomas >> > > >> > > >> > > p.s. I contemplated to do NMT overflow checks and removal of old guarding code in one RFE but was concerned that it would be too confusing and get stuck in review limbo. Maybe that was wrong. But this RFE here makes more sense when viewed as part of a whole. >> > >> > >> > Thanks for explanation. So, buffer overrun detection is now only available when NMT is on, vs. always on with GuardedMemory in debug build. Right? >> >> Well, not with this patch obviously. But yes, that would be my proposal. To get "always-on", we could switch NMT on by default in debug builds. "summary" level is not really expensive at all, it uses less memory than GuardedMemory does, and the per-flag accounting does not really add much overhead (GuardedMemory also does some accounting btw). >> >> Though tbh my first priority is to give us overflow checks in release builds. If we only do that and leave GuardedMemory in place I would be happy already. I had two customer cases very recently with heap overwriters, one of which I misused NMT to trigger a crash and analyze the core. A neighboring (non-VM-allocated) block was overwriting the following (VM allocated) heap block. >> >> Cheers, Thomas > > I have no problem on technical side. Changing NMT default value, I believe, needs CSR. Probably should start with a CSR to get a consensus. > > Thanks. > > -Zhengyu Thank you @zhengyu123 and @simonis! I'll do one round of stress tests more, then push. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From dholmes at openjdk.java.net Tue Nov 23 06:52:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 23 Nov 2021 06:52:07 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> On Tue, 23 Nov 2021 05:24:41 GMT, Sandhya Viswanathan wrote: >> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks. > > @dholmes-ora I have implemented your review comments. Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From pli at openjdk.java.net Tue Nov 23 08:12:07 2021 From: pli at openjdk.java.net (Pengfei Li) Date: Tue, 23 Nov 2021 08:12:07 GMT Subject: RFR: 8277168: AArch64: Enable arraycopy partial inlining with SVE In-Reply-To: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com> References: <82Kgtn4RllwF2ifvmwtaQaeG9ADXeUoq290BKnd8PZ4=.ed410c36-2f5c-4b29-9d96-07d33ac872ee@github.com> Message-ID: <0NpXDvx0PPQgOnuxjlDayD-n5Y9nojMQhPRul1ysKqk=.b4e4fdc9-10bc-485f-843e-22c4cc360647@github.com> On Fri, 19 Nov 2021 08:07:13 GMT, Jatin Bhateja wrote: >> The x86 failure is caused by a recent commit (see [JDK-8277324](https://bugs.openjdk.java.net/browse/JDK-8277324)) and unrelated to this PR. > > Hi @pfustc , common type system changes looks good to me. Thank you for looking at my PR. This C2 technique was originally developed by @jatin-bhateja from Intel to optimize small-sized memory copy with x86 AVX-512 masked vector instructions. Now I propose to enable it on AArch64 with SVE. Yes, it has benefit only if the copy size is less than the size of a vector. It's 512 bits on x86, but on AArch64 SVE the max copy size it can benefit depends on the hardware's implementation of the scalable vector register (from 128 bits to 2048 bits). @theRealAph , do you approve this PR? or any specific feedback or suggestion? ------------- PR: https://git.openjdk.java.net/jdk/pull/6444 From mdoerr at openjdk.java.net Tue Nov 23 09:32:08 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 23 Nov 2021 09:32:08 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v10] In-Reply-To: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> References: <3DyX38fUwXmYfYuInLP-xhm1toijhtr2U7pHK2zhNqU=.b91e17bd-bea6-4323-96e0-03c59e3f0573@github.com> Message-ID: On Thu, 18 Nov 2021 10:21:01 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Fix jit/t/t105/t105.java to also use -XX:-OptimizeImplicitExceptions in addition to -XX:-OmitStacktracesInFastThrow > - Fix IR Framework test Traps::classCheck() which now behaves differently with -XX:+OptimizeImplicitExceptions > - Fix build issue for minimal/zero build one more time > - Minor enhancements and fixes requested by Martin > - Add new WhiteBox functionality to sun/hotspot/WhiteBox.java as well to avoid warnings in the tests which are still using it. > - Fix build issue for minimal/zero build > - Added jtreg test and extended the Whitebox API to export decompile, deopt and trap counters > - Fix special case where we're creating an implicit exception for a regular invoke* bytecode > - Minor updates as requested by @TheRealMDoerr > - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow I think this workaround is ok. C2 currently doesn't support extended exception messages other than NullPointerExceptions. If this change gets accepted, I think we should add C2 support for other primitive Exceptions. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From pliden at openjdk.java.net Tue Nov 23 09:45:10 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 23 Nov 2021 09:45:10 GMT Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 13:49:02 GMT, Erik ?sterlund wrote: > The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used. Looks good. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6501 From stefank at openjdk.java.net Tue Nov 23 09:52:06 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 23 Nov 2021 09:52:06 GMT Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 13:49:02 GMT, Erik ?sterlund wrote: > The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used. Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6501 From eosterlund at openjdk.java.net Tue Nov 23 13:42:06 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 23 Nov 2021 13:42:06 GMT Subject: RFR: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper In-Reply-To: References: Message-ID: <8246XgMZ_pumK-BgCx9osG0N9jvJxNGmcJVw8s_0oqo=.d723b7d4-ac0e-4897-af9f-65524377ab87@github.com> On Tue, 23 Nov 2021 09:42:10 GMT, Per Liden wrote: >> The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used. > > Looks good. Thanks for the reviews, @pliden and @stefank! ------------- PR: https://git.openjdk.java.net/jdk/pull/6501 From eosterlund at openjdk.java.net Tue Nov 23 14:22:46 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 23 Nov 2021 14:22:46 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts Message-ID: The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. ------------- Commit messages: - 8277631: ZGC: CriticalMetaspaceAllocation asserts Changes: https://git.openjdk.java.net/jdk/pull/6520/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6520&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277631 Stats: 35 lines in 2 files changed: 30 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/6520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6520/head:pull/6520 PR: https://git.openjdk.java.net/jdk/pull/6520 From pliden at openjdk.java.net Tue Nov 23 14:37:12 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 23 Nov 2021 14:37:12 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund wrote: > The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. Marked as reviewed by pliden (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6520 From eosterlund at openjdk.java.net Tue Nov 23 14:38:15 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 23 Nov 2021 14:38:15 GMT Subject: Integrated: 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper In-Reply-To: References: Message-ID: On Mon, 22 Nov 2021 13:49:02 GMT, Erik ?sterlund wrote: > The VM_HeapDumper code uses a C heap allocated ParallelObjectIterator. It is constructed right before running a parallel operation with a work gang, but freed in the destructor of the VM_HeapDumper. This means it is created on one thread and deleted on another thread. This becomes a bit problematic when a parallel object iterator implementation uses a ThreadsListHandle (which is indeed the case for ZGC). This patch changes ParallelObjectIterator to be a StackObj, carrying a ParallelObjectIteratorImpl object, which is never exposed publicly. This ensures that construction and destruction of the internal object iterator is scoped like RAII objects, hence complying with how ThreadsListHandle is supposed to be used. This pull request has now been integrated. Changeset: f4dc03ea Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/f4dc03ea6de327425ff265c3d2ec16ea7b0e1634 Stats: 70 lines in 15 files changed: 35 ins; 11 del; 24 mod 8276696: ParallelObjectIterator freed at the wrong time in VM_HeapDumper Reviewed-by: pliden, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/6501 From eosterlund at openjdk.java.net Tue Nov 23 14:46:08 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 23 Nov 2021 14:46:08 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts In-Reply-To: References: Message-ID: <9tidE4J6H_rFLh6OWbKAEzZziYYULySBJJr9RVgcnlI=.a1b80d06-bb9f-45be-9b83-9ad6a1036788@github.com> On Tue, 23 Nov 2021 14:33:47 GMT, Per Liden wrote: >> The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. > > Marked as reviewed by pliden (Reviewer). Thanks for the reviews, @pliden and @stefank! ------------- PR: https://git.openjdk.java.net/jdk/pull/6520 From stefank at openjdk.java.net Tue Nov 23 14:46:08 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 23 Nov 2021 14:46:08 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts In-Reply-To: References: Message-ID: <_W_qcnx6Y9G12rJKxfvrKbuFDnUJcs6dC0F8agu5ueE=.63b4b4dd-0396-4431-bf7b-8c04e2f380e0@github.com> On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund wrote: > The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/6520 From jiefu at openjdk.java.net Tue Nov 23 16:04:23 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 23 Nov 2021 16:04:23 GMT Subject: RFR: 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph Message-ID: Hi all, `ShenandoahBarrierC2Support::verify_raw_mem` crashes due to `u->unique_ctrl_out()` [1] returns NULL for malformed control flow graph. It can be reproduced by running `compiler/vectorapi/TestIntrinsicBailOut.java` with `-XX:+UseShenandoahGC`. It would be better to fix it. Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L1925 ------------- Commit messages: - 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph Changes: https://git.openjdk.java.net/jdk/pull/6525/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6525&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277652 Stats: 13 lines in 2 files changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6525.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6525/head:pull/6525 PR: https://git.openjdk.java.net/jdk/pull/6525 From rkennke at openjdk.java.net Tue Nov 23 16:24:07 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 23 Nov 2021 16:24:07 GMT Subject: RFR: 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph In-Reply-To: References: Message-ID: <81o2YKFQvTE2C9qqBBDBjC5L1dNyPMRTJw1CcTdD2SA=.6946cbb3-de08-46b7-9724-7c39a989efc3@github.com> On Tue, 23 Nov 2021 15:59:00 GMT, Jie Fu wrote: > Hi all, > > `ShenandoahBarrierC2Support::verify_raw_mem` crashes due to `u->unique_ctrl_out()` [1] returns NULL for malformed control flow graph. > It can be reproduced by running `compiler/vectorapi/TestIntrinsicBailOut.java` with `-XX:+UseShenandoahGC`. > It would be better to fix it. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L1925 Thank you, Jie! I am currently working on a change that would make LRB runtime call not consume or produce raw memory at all, and would obsolete your change. See #6526 . ------------- PR: https://git.openjdk.java.net/jdk/pull/6525 From eastig at amazon.co.uk Tue Nov 23 17:34:44 2021 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Tue, 23 Nov 2021 17:34:44 +0000 Subject: RFC: improving NMethod code locality in CodeCache Message-ID: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> Hello, We?d like to discuss a proposal for improving NMethod code locality in CodeCache. We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. The current NMethod layout is continuous and consists of the following sections: * Header: This is C++ part of NMethod: class members and other C++ stuff. Its size is ?sizeof(NMethod)?. Jdk17 arm64 has it to be 344 bytes. On x86_64 it is 352 bytes. * Relocation * Constant pool * Instructions (main code) * Stub code * Oops * Metadata: Class related metadata * Scopes data: Debugging information * Scopes pcs: Debugging information * Dependencies * Handler table: Exception handler table * Nul chk table: Implicit Null Pointer exception table * Speculations * JVMCI data We collected the section sizes of C2 nmethods in the DaCapo and Renaissance benchmarks on x86_64 and arm64. The C2 methods were got with ?XX:+LogCompilation?. Summary of results for jdk17 with tiered compilation: * DaCapo: * arm64 (full data https://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_arm64.csv): +---------------------+---------+------------+-----------+ | | min | max | median | +---------------------+---------+------------+-----------+ | C2 nmethods | 152 | 5215 | 916 | | Total size - bytes | 271,576 | 38,367,872 | 4,072,616 | +---------------------+---------+------------+-----------+ Proportion of the total size of a section vs C2 nmethods total size +---------------+-------+-------+--------+ | Section | min | max | median | +---------------+-------+-------+--------+ | header | 4.7% | 19.3% | 8.0% | | consts | 0.0% | 0.1% | 0.0% | | instrs | 39.7% | 49.7% | 44.5% | | stub code | 8.9% | 11.3% | 10.1% | | oops | 0.2% | 0.4% | 0.3% | | metadata | 2.0% | 3.0% | 2.3% | | scopes data | 12.2% | 18.6% | 15.9% | | scopes pcs | 7.8% | 9.0% | 8.4% | | deps | 0.3% | 0.8% | 0.5% | | handler table | 1.3% | 3.3% | 2.1% | | nul_chk table | 1.0% | 1.6% | 1.6% | +---------------+-------+-------+--------+ * x86_64 (full data https://github.com/eastig/codecache/blob/master/jdk17/dacapo_c2_sizes_x86_64.csv): +---------------------+---------+------------+-----------+ | | min | max | median | +---------------------+---------+------------+-----------+ | C2 nmethods | 155 | 5135 | 889 | | Total size - bytes | 264,800 | 35,026,312 | 3,985,744 | +---------------------+---------+------------+-----------+ Proportion of the total size of a section vs C2 nmethods total size +---------------+-------+-------+--------+ | Section | min | max | median | +---------------+-------+-------+--------+ | header | 5.2% | 20.6% | 8.3% | | consts | 0.0% | 0.6% | 0.1% | | instrs | 49.2% | 60.7% | 55.3% | | stub code | 1.1% | 1.9% | 1.4% | | oops | 0.1% | 0.3% | 0.2% | | metadata | 1.6% | 2.9% | 2.0% | | scopes data | 12.2% | 19.6% | 16.8% | | scopes pcs | 7.8% | 9.2% | 8.5% | | deps | 0.3% | 0.8% | 0.5% | | handler table | 1.5% | 3.5% | 2.0% | | nul_chk table | 0.9% | 1.6% | 1.1% | +---------------+-------+-------+--------+ * Renaissance * arm64 (full data https://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_arm64.csv): +---------------------+---------+------------+-----------+ | | min | max | median | +---------------------+---------+------------+-----------+ | C2 nmethods | 155 | 7447 | 1198 | | Total size - bytes | 366,248 | 52,840,528 | 4,989,392 | +---------------------+---------+------------+-----------+ Proportion of the total size of a section vs C2 nmethods total size +---------------+-------+-------+--------+ | Section | min | max | median | +---------------+-------+-------+--------+ | header | 4.8% | 14.6% | 8.5% | | consts | 0.0% | 0.1% | 0.0% | | instrs | 35.7% | 45.6% | 42.8% | | stub code | 8.3% | 12.0% | 10.1% | | oops | 0.2% | 0.6% | 0.4% | | metadata | 2.0% | 4.1% | 3.0% | | scopes data | 12.4% | 20.8% | 16.1% | | scopes pcs | 7.8% | 8.9% | 8.4% | | deps | 0.4% | 1.0% | 0.5% | | handler table | 1.2% | 3.9% | 2.4% | | nul_chk table | 0.9% | 1.3% | 1.1% | +---------------+-------+-------+--------+ * x86_64 (full data https://github.com/eastig/codecache/blob/master/jdk17/renaissance_c2_sizes_x86_64.csv): +---------------------+---------+------------+-----------+ | | min | max | median | +---------------------+---------+------------+-----------+ | C2 nmethods | 158 | 7242 | 938 | | Total size - bytes | 354,952 | 47,019,560 | 3,791,764 | +---------------------+---------+------------+-----------+ Proportion of the total size of a section vs C2 nmethods total size +---------------+-------+-------+--------+ | Section | min | max | median | +---------------+-------+-------+--------+ | header | 5.4% | 15.7% | 9.7% | | consts | 0.0% | 0.1% | 0.0% | | instrs | 46.1% | 54.4% | 52.7% | | stub code | 1.3% | 1.9% | 1.4% | | oops | 0.2% | 0.5% | 0.3% | | metadata | 1.9% | 3.4% | 2.6% | | scopes data | 12.7% | 23.6% | 17.4% | | scopes pcs | 8.0% | 9.4% | 8.6% | | deps | 0.4% | 1.0% | 0.5% | | handler table | 1.3% | 4.0% | 2.5% | | nul_chk table | 1.0% | 1.4% | 1.2% | +---------------+-------+-------+--------+ The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under. There can be different approaches for the implementation: 1. What to separate: a. All code (main plus stub) from other sections. b. Or only main code because this is the code where an application should spend most of the time. c. Or the header and scope sections. 2. Where to put: a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. c. Or in a completely different place (C-heap, Metaspace,...) It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Comments welcome! Thanks, Evgeny Astigeevich, AWS Corretto Team Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From sviswanathan at openjdk.java.net Tue Nov 23 17:52:40 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 17:52:40 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3] In-Reply-To: References: Message-ID: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: update comment for avx3_threshold() with more details ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/e0cb890d..c90e7004 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=01-02 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From jbhateja at openjdk.java.net Tue Nov 23 19:06:15 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Tue, 23 Nov 2021 19:06:15 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3] In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 17:52:40 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > update comment for avx3_threshold() with more details src/hotspot/cpu/x86/vm_version_x86.hpp line 920: > 918: // is set to 0 for these platforms. > 919: static int avx3_threshold() { return ((is_intel_family_core() && > 920: supports_serialize()) ? 0: AVX3Threshold); } Hi @sviswa7 , Should we not return a zero threshold only if user does not explicitly set AVX3Threshold i.e. in default case. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 23 22:24:07 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 22:24:07 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: References: Message-ID: <-cmJjHI8NnKQ0YbPeHP_aRW7J797ZT38ZS9dCeGwdSw=.62291db3-14bd-4ed5-ac01-6f783ab5c5fe@github.com> On Tue, 23 Nov 2021 06:05:48 GMT, Jie Fu wrote: >> @sviswa7 that further restriction and an explanatory comment would be appreciated. Thanks. > >> @dholmes-ora We see about 25% gain on a micro on our latest platform. There is no cpuid bit for this, so the closest was to check for the new serialize ISA supported on this platform. > > It would be better to add a jmh test for this opt. > Thanks. @DamonFool There are jmh tests for Arraycopy in test/micro/org/openjdk/bench/java/lang/Arraycopy.java. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 23 22:46:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 23 Nov 2021 22:46:04 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3] In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 19:01:53 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> update comment for avx3_threshold() with more details > > src/hotspot/cpu/x86/vm_version_x86.hpp line 920: > >> 918: // is set to 0 for these platforms. >> 919: static int avx3_threshold() { return ((is_intel_family_core() && >> 920: supports_serialize()) ? 0: AVX3Threshold); } > > Hi @sviswa7 , Should we not return a zero threshold only if user does not explicitly set AVX3Threshold i.e. in default case. @jatin-bhateja On these platforms it is beneficial to set the threshold to zero for copy and clear operations and hence the override. I have described that in the comment in detail as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From dholmes at openjdk.java.net Wed Nov 24 05:05:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 24 Nov 2021 05:05:07 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v3] In-Reply-To: References: Message-ID: <2zFKQ-o4UXauqYztn-Zu02_rCOJ57minkJaZvT7BShk=.249b47b5-e94e-419b-9c8a-e1342ebfe0a4@github.com> On Tue, 23 Nov 2021 22:43:03 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 920: >> >>> 918: // is set to 0 for these platforms. >>> 919: static int avx3_threshold() { return ((is_intel_family_core() && >>> 920: supports_serialize()) ? 0: AVX3Threshold); } >> >> Hi @sviswa7 , Should we not return a zero threshold only if user does not explicitly set AVX3Threshold i.e. in default case. > > @jatin-bhateja On these platforms it is beneficial to set the threshold to zero for copy and clear operations and hence the override. I have described that in the comment in detail as well. @sviswa7 I tend to agree with @jatin-bhateja . AVX3Threshold is a diagnostic flag so if someone has deliberately modified it so they can measure something, your change will make that impossible on newer systems. You may want to define a static field to store the actual value for `avx3_threshold()` to return and initialize it during VM initialization. Or lazy initialize it on first use. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From dholmes at openjdk.java.net Wed Nov 24 05:19:06 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 24 Nov 2021 05:19:06 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund wrote: > The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. Just a comment but it always concerns me that if we have to manually add a TBIVM when using a non-safepoint-checking lock then the lock is mis-classified as a non-safepoint-checking one! :( That aside changes look fine. A few grammatical nits in the test. Thanks, David src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 130: > 128: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) { > 129: for (;;) { > 130: ThreadBlockInVM tbivm(JavaThread::current()); Can't you move the TBIVM outside of the loop now that it is always created? test/hotspot/jtreg/vmTestbase/gc/gctests/LoadUnloadGC/LoadUnloadGC.java line 55: > 53: * VM Testbase keywords: [gc, stress, stressopt, nonconcurrent, monitoring] > 54: * VM Testbase readme: > 55: * In this test a 1000 classes are loaded and unloaded in a loop. nit: /a 1000/1000/ test/hotspot/jtreg/vmTestbase/gc/gctests/LoadUnloadGC/LoadUnloadGC.java line 57: > 55: * In this test a 1000 classes are loaded and unloaded in a loop. > 56: * Class0 gets loaded which results in Class1 getting loaded and so on all > 57: * the way uptill class1000. The classes should be unloaded whenever a nit: /uptill/up until/ or /up to/ test/hotspot/jtreg/vmTestbase/gc/gctests/LoadUnloadGC/LoadUnloadGC.java line 59: > 57: * the way uptill class1000. The classes should be unloaded whenever a > 58: * garbage collection takes place because their classloader is made unreachable > 59: * at the end of the each loop iteration. The loop is repeated 1000 times. nit: s/the each/each/ ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6520 From eosterlund at openjdk.java.net Wed Nov 24 08:27:37 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 24 Nov 2021 08:27:37 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts [v2] In-Reply-To: References: Message-ID: > The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: dholmes review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6520/files - new: https://git.openjdk.java.net/jdk/pull/6520/files/ec93bede..ea651174 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6520&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6520&range=00-01 Stats: 10 lines in 2 files changed: 2 ins; 2 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6520/head:pull/6520 PR: https://git.openjdk.java.net/jdk/pull/6520 From eosterlund at openjdk.java.net Wed Nov 24 08:41:07 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 24 Nov 2021 08:41:07 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts [v2] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 05:11:23 GMT, David Holmes wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes review comments > > src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 130: > >> 128: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) { >> 129: for (;;) { >> 130: ThreadBlockInVM tbivm(JavaThread::current()); > > Can't you move the TBIVM outside of the loop now that it is always created? > Just a comment but it always concerns me that if we have to manually add a TBIVM when using a non-safepoint-checking lock then the lock is mis-classified as a non-safepoint-checking one! :( > > That aside changes look fine. A few grammatical nits in the test. > > Thanks, David Thanks for the review David. I fixed your nits. I agree that this lock should preferably have been a safepoint checking lock. In fact, it *was* a safepoint checking lock when I wrote the code. But that was before we changed the locking rules so that whether we do safepoint checking or not is a function of the rank. After that, this lock has become constrained to the current low rank by the current set of other locks, and by being that low ish rank, it is not allowed to safepoint check, unless I move a bunch of other lock ranks around, and figure out if it's okay for those locks to start safepoint checking as well, which isn't entirely obvious. So this lock might be a case where those new rules end up being a bit awkward, and have lead this code to do manual transitions to blocked instead, as an escape hatch from the asserts. ------------- PR: https://git.openjdk.java.net/jdk/pull/6520 From dholmes at openjdk.java.net Wed Nov 24 09:02:09 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 24 Nov 2021 09:02:09 GMT Subject: RFR: 8277631: ZGC: CriticalMetaspaceAllocation asserts [v2] In-Reply-To: References: Message-ID: <5bQ6PmeDjpFBgxZ6-xVI4uKgo1LCiXXyDOsRkiQjkZg=.e551776e-1f2c-4610-923a-569852138f3f@github.com> On Wed, 24 Nov 2021 08:38:16 GMT, Erik ?sterlund wrote: >> src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 130: >> >>> 128: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) { >>> 129: for (;;) { >>> 130: ThreadBlockInVM tbivm(JavaThread::current()); >> >> Can't you move the TBIVM outside of the loop now that it is always created? > >> Just a comment but it always concerns me that if we have to manually add a TBIVM when using a non-safepoint-checking lock then the lock is mis-classified as a non-safepoint-checking one! :( >> >> That aside changes look fine. A few grammatical nits in the test. >> >> Thanks, David > > Thanks for the review David. I fixed your nits. > > I agree that this lock should preferably have been a safepoint checking lock. In fact, it *was* a safepoint checking lock when I wrote the code. But that was before we changed the locking rules so that whether we do safepoint checking or not is a function of the rank. After that, this lock has become constrained to the current low rank by the current set of other locks, and by being that low ish rank, it is not allowed to safepoint check, unless I move a bunch of other lock ranks around, and figure out if it's okay for those locks to start safepoint checking as well, which isn't entirely obvious. > > So this lock might be a case where those new rules end up being a bit awkward, and have lead this code to do manual transitions to blocked instead, as an escape hatch from the asserts. Ah! I forgot about the rank changes that forced this dichotomy. ------------- PR: https://git.openjdk.java.net/jdk/pull/6520 From mcimadamore at openjdk.java.net Wed Nov 24 11:55:11 2021 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 24 Nov 2021 11:55:11 GMT Subject: Integrated: 8275063: Implementation of Foreign Function & Memory API (Second incubator) In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 11:16:51 GMT, Maurizio Cimadamore wrote: > This PR contains the API and implementation changes for JEP-419 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.java.net/jeps/419 This pull request has now been integrated. Changeset: 96e36071 Author: Maurizio Cimadamore URL: https://git.openjdk.java.net/jdk/commit/96e36071b63b624d56739b014b457ffc48147c4f Stats: 14700 lines in 193 files changed: 6958 ins; 5126 del; 2616 mod 8275063: Implementation of Foreign Function & Memory API (Second incubator) Reviewed-by: erikj, psandoz, jvernee, darcy ------------- PR: https://git.openjdk.java.net/jdk/pull/5907 From stuefe at openjdk.java.net Wed Nov 24 12:16:14 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Nov 2021 12:16:14 GMT Subject: RFR: JDK-8275320: NMT should perform buffer overrun checks [v6] In-Reply-To: References: Message-ID: <2dExt0esGNAm7M_l0tiCAQvl6kuUMHCjArg2_KkD4aE=.679f5205-1053-4b2d-8631-8ad76a16c7ff@github.com> On Fri, 19 Nov 2021 14:29:17 GMT, Thomas Stuefe wrote: >> This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. >> >> This proposal adds NMT buffer overflow checking: >> >> - it gives us C-heap overflow checking in release builds >> - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. >> - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. >> - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. >> >> For more details, please see the JBS issue. >> >> ---- >> >> Patch notes: >> >> - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. >> - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. >> - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. >> >> - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. >> >> - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. >> >> - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). >> >> - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. >> >> - I made the assert for malloc site table width a compile time STATIC_ASSERT. >> >> -------------- >> >> Example output a buffer overrun would provide: >> >> >> Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: >> 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 >> 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 >> 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 >> # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) >> # >> >> ------- >> >> Tests: >> - manual tests with Linux x64, x86, minimal build >> - GHAs all clean >> - SAP nightlies ran for 4 weeks now without problems > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Volker Feedback 2 > - Fix Zhengyu Problem in os::realloc > - Extend gtests > - extend footer to 2 bytes > - Feedback Volker > - Let NMT do overflow detection Nightlies are clean. ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From stuefe at openjdk.java.net Wed Nov 24 12:16:16 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Nov 2021 12:16:16 GMT Subject: Integrated: JDK-8275320: NMT should perform buffer overrun checks In-Reply-To: References: Message-ID: On Thu, 14 Oct 2021 15:49:05 GMT, Thomas Stuefe wrote: > This is part of a number of RFE I plan to improve and simplify C-heap overflow checking in hotspot. > > This proposal adds NMT buffer overflow checking: > > - it gives us C-heap overflow checking in release builds > - the costs are neglectable: if NMT is off, we won't pay anything; if NMT is on, the added work is minuscule since we have to do malloc header management anyway. > - NMT needs intact headers anyway. Faced with buffer overwrites today, it would maybe crash or maybe account wrongly, but it's a bit of a lottery really. Better to go the extra step and do a real check. > - it could be a preparation for future code removal, if we wanted to do that (see details in umbrella RFE https://bugs.openjdk.java.net/browse/JDK-8275301). That way, net complexity would come down even with this patch. > > For more details, please see the JBS issue. > > ---- > > Patch notes: > > - The malloc header is changed such that it contains a 16-bit canary directly preceding the user payload of the allocation. The new malloc header does not use bitfields anymore but normal types. For more details, see the comment in mallocTracker.hpp. > - On 64-bit, we don't enlarge the malloc header. It remains 16 bytes in length. So no additional memory cost (apart from the 1-byte-footer, see below). Space for the canary is instead obtained by reducing the size of the bucket index bit field to 16 bits. That bit field is used to store the bucket slot index of the malloc site table in NMT detail mode. With 40 bits it was over-dimensioned, and even 16-bits arguably still are: malloc site table width is 512. > - On 32-bit, I had to enlarge the header from 8 bytes to 16 bytes to make room for a canary. But strictly speaking 8 bytes were not enough anyway: the header size has to be large enough to satisfy malloc(3) alignment, and that would be 16 bytes. I believe it never led to an error since we don't store 128bit data in malloc'd memory in the hotspot anywhere. > > - I added a footer canary trailing the user allocation to catch tail buffer overruns. To keep matters simple (alignment) I made it a single byte only. That is enough to catch most overrun scenarios. > > - I brushed up error reporting. When NMT detects corruption, it will now print out a hex dump of the corrupted area to tty before asserting. > > - I added a bunch of gtests to test various heap overwrite scenarios. I also had to extend the gtest macros a bit because I wanted these tests of course to run in release builds too, but we did not have a death test macro for release builds yet (there are possibilities for code simplification here too, but that's for another RFE). > > - I renamed `nmt_header_size` to `nmt_overhead` since that size includes header and footer now. > > - I made the assert for malloc site table width a compile time STATIC_ASSERT. > > -------------- > > Example output a buffer overrun would provide: > > > Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > NMT Block at 0x00005600f86136b0, corruption at: 0x00005600f86136c1: > 0x00005600f86136a8: 21 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 > 0x00005600f86136b8: 00 00 00 00 0f 00 1f fa 00 61 00 00 00 00 00 00 > 0x00005600f86136c8: 41 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f86136f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613708: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613718: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613728: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x00005600f8613738: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > assert failed: fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (mallocTracker.cpp:203), pid=10805, tid=10805 > # fatal error: Block at 0x00005600f86136b0: footer canary broken at 0x00005600f86136c1 (buffer overflow?) > # > > ------- > > Tests: > - manual tests with Linux x64, x86, minimal build > - GHAs all clean > - SAP nightlies ran for 4 weeks now without problems This pull request has now been integrated. Changeset: cf7adae6 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/cf7adae6333c7446048ef0364737927337631f63 Stats: 434 lines in 11 files changed: 385 ins; 11 del; 38 mod 8275320: NMT should perform buffer overrun checks 8275320: NMT should perform buffer overrun checks 8275301: Unify C-heap buffer overrun checks into NMT Reviewed-by: simonis, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/5952 From zgu at openjdk.java.net Wed Nov 24 16:02:17 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 24 Nov 2021 16:02:17 GMT Subject: RFR: 8277797: Remove undefined/unused SharedRuntime::trampoline_size() Message-ID: A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002). ------------- Commit messages: - v0 Changes: https://git.openjdk.java.net/jdk/pull/6540/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6540&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277797 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6540.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6540/head:pull/6540 PR: https://git.openjdk.java.net/jdk/pull/6540 From simonis at openjdk.java.net Wed Nov 24 16:41:27 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 24 Nov 2021 16:41:27 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter Message-ID: `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. ------------- Commit messages: - 8275908: Record null_check traps for calls and array_check traps in the interpreter Changes: https://git.openjdk.java.net/jdk/pull/6541/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275908 Stats: 519 lines in 9 files changed: 509 ins; 2 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/6541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6541/head:pull/6541 PR: https://git.openjdk.java.net/jdk/pull/6541 From sviswanathan at openjdk.java.net Wed Nov 24 16:55:32 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 24 Nov 2021 16:55:32 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 [v4] In-Reply-To: References: Message-ID: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Override threshold only if flag is default ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/c90e7004..021bc659 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=02-03 Stats: 21 lines in 2 files changed: 14 ins; 6 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Wed Nov 24 16:55:33 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 24 Nov 2021 16:55:33 GMT Subject: RFR: 8277617: Optimize array copy and clear on x86_64 In-Reply-To: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> References: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> Message-ID: On Tue, 23 Nov 2021 06:49:07 GMT, David Holmes wrote: >> @dholmes-ora I have implemented your review comments. > > Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks. @dholmes-ora @jatin-bhateja I have added a check for FLAG_IS_DEFAULT before overriding the threshold. Let me know if this looks ok to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From jbhateja at openjdk.java.net Wed Nov 24 18:39:14 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 24 Nov 2021 18:39:14 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v4] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 16:55:32 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Override threshold only if flag is default Thanks @sviswa7 , changes looks good to me. Best Regards ------------- Marked as reviewed by jbhateja (Committer). PR: https://git.openjdk.java.net/jdk/pull/6512 From psandoz at openjdk.java.net Wed Nov 24 19:40:31 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Wed, 24 Nov 2021 19:40:31 GMT Subject: RFR: 8277155: Compress and expand vector operations Message-ID: Add two new cross-lane vector operations, `compress` and `expand`. An example of such usage might be code that selects elements from array `a` and stores those selected elements in array `z`: int[] a = ...; int[] z = ...; int ai = 0, zi = 0; while (ai < a.length) { IntVector av = IntVector.fromArray(SPECIES, a, ai); // query over elements of vector av // returning a mask marking elements of interest VectorMask m = interestingBits(av, ...); IntVector zv = av.compress(m); zv.intoArray(z, zi, m.compress()); ai += SPECIES.length(); zi += m.trueCount(); } (There's also a more sophisticated version using `unslice` to coalesce matching elements with non-masked stores.) Given RDP 1 for 18 is getting close, 2021/12/09, we may not get this reviewed in time and included in [JEP 417](https://openjdk.java.net/jeps/417). Still I think I think it worth starting the review now (the CSR is marked provisional). ------------- Commit messages: - 8277155: Compress and expand vector operations Changes: https://git.openjdk.java.net/jdk/pull/6545/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6545&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277155 Stats: 5429 lines in 105 files changed: 5315 ins; 21 del; 93 mod Patch: https://git.openjdk.java.net/jdk/pull/6545.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6545/head:pull/6545 PR: https://git.openjdk.java.net/jdk/pull/6545 From dholmes at openjdk.java.net Thu Nov 25 05:14:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 25 Nov 2021 05:14:04 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v4] In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 16:55:32 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Override threshold only if flag is default General change looks okay but I have a query below about startup overhead. Also what testing has been done for this aside from the benchmarking? AFAICS there is only a single test that currently sets AVX3Threshold to zero so we have very little test coverage for that. With this change it will be zero all the time on some systems and so will now be exercising code paths that do not normally get executed. Thanks, David src/hotspot/cpu/x86/vm_version_x86.cpp line 1893: > 1891: return AVX3Threshold; > 1892: } > 1893: } I am somewhat concerned about the overhead of evaluating this each time it is used. I realize these will only be startup costs while generating the stubs, not part of the stubs themselves, but it still may be a startup impact. Can you run a startup benchmark to see if there is any problem? I was also thinking the more direct formulation would just be: ```return (is_intel_family_core() && supports_serialize() && FLAG_IS_DEFAULT(AVX3Threshold)) ? 0 : AVX3Threshold;``` ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From dholmes at openjdk.java.net Thu Nov 25 05:18:06 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 25 Nov 2021 05:18:06 GMT Subject: RFR: 8277797: Remove undefined/unused SharedRuntime::trampoline_size() In-Reply-To: References: Message-ID: <_mUzi5C5miYGHHXIwY2Hv3tCeZ1676_hZH9raqFn4Gw=.2d265a64-c041-4f73-a775-76d7f09470a3@github.com> On Wed, 24 Nov 2021 15:54:47 GMT, Zhengyu Gu wrote: > A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002). Good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6540 From stuefe at openjdk.java.net Thu Nov 25 05:28:07 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 25 Nov 2021 05:28:07 GMT Subject: RFR: 8277797: Remove undefined/unused SharedRuntime::trampoline_size() In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 15:54:47 GMT, Zhengyu Gu wrote: > A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002). LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6540 From eosterlund at openjdk.java.net Thu Nov 25 09:54:10 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 25 Nov 2021 09:54:10 GMT Subject: Integrated: 8277631: ZGC: CriticalMetaspaceAllocation asserts In-Reply-To: References: Message-ID: <-KsYYH2luM9cu--SGpZz61ELwEU5H9BKrpSnlmVDc10=.67bd4901-9edb-4ee4-a108-098b947644f3@github.com> On Tue, 23 Nov 2021 14:14:31 GMT, Erik ?sterlund wrote: > The MetaspaceCritical_lock is a non-safepoint checking lock. That implies that the allow VM block flag is true. That implies that taking that lock takes a NoSafepointVerifier. That causes an assert to fire when MetaspaceCriticalAllocation::wait_for_purge transitions to blocked with ThreadBlockInVM while holding the lock. The fix is to move the locker inside of the ThreadBlockInVM. This pull request has now been integrated. Changeset: 3034ae87 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/3034ae87ce4b94c7dc40cfb5a96d6d1e87910bbf Stats: 39 lines in 2 files changed: 30 ins; 0 del; 9 mod 8277631: ZGC: CriticalMetaspaceAllocation asserts Reviewed-by: pliden, stefank, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6520 From chagedorn at openjdk.java.net Thu Nov 25 10:32:08 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 25 Nov 2021 10:32:08 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: References: Message-ID: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> On Wed, 24 Nov 2021 16:33:35 GMT, Volker Simonis wrote: > `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. > > `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. > > The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. Otherwise, it looks good to me! But would be good to get a second review for it. Nice test! src/hotspot/share/interpreter/interpreterRuntime.cpp line 834: > 832: THREAD); > 833: > 834: if(HAS_PENDING_EXCEPTION) { Missing space src/hotspot/share/opto/parseHelper.cpp line 301: > 299: > 300: #endif > 301: This line was probably deleted by mistake? src/hotspot/share/runtime/deoptimization.hpp line 436: > 434: > 435: static jint total_deoptimization_count(); > 436: static jint deoptimization_count(const char *reason_str, const char *action_str); Nit: Asterisk should be at the type: `const char* reason_str` test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 141: > 139: private static void printCounters(TestMode testMode, ImplicitException impExcp, Method throwImplicitException_m, int invocations) { > 140: System.out.println("testMode=" + testMode + " exception=" + impExcp + " invocations=" + invocations + "\n" + > 141: "decompilecount=" + WB.getMethodDecompileCount(throwImplicitException_m) + " " + `getMethodDecompileCount()` seems only to be used to print the counters here but is not verified otherwise. If it is is not too complicated, could a specific test for it be added as well? test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 166: > 164: // Checks after the JIT-compiled test method has been invoked 'PerBytecodeTrapLimit' times. > 165: private static void checkTwo(TestMode testMode, ImplicitException impExcp, Exception ex, Method throwImplicitException_m, int invocations) { > 166: If I see that correctly, `checkTwo`, `checkThree` and `checkFour` only differ in whether using `PerBytecodeTrapLimit` or `Tier0InvokeNotifyFreq` and could be merged together (if the omitted assertions in `checkThree` and `checkFour` for the exception message compared to `checkTwo` are valid to be added again). test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 287: > 285: checkTwo(testMode, impExcp, lastException, throwImplicitException_m, invocations); > 286: > 287: // Invoke compiled (or interpreted if JDK-8275908 isn't fixed) code 'Tier0InvokeNotifyFreq' times. As this is the fix for JDK-8275908, you can remove the comment about it :-) It's probably a leftover from JDK- 8273563. test/lib/sun/hotspot/WhiteBox.java line 321: > 319: return getMethodCompilationLevel0(method, isOsr); > 320: } > 321: public int getMethodDecompileCount(Executable method) { As this class is marked as `@Deprecated`, do we need to add the methods here as well? ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From duke at openjdk.java.net Thu Nov 25 10:50:10 2021 From: duke at openjdk.java.net (duke) Date: Thu, 25 Nov 2021 10:50:10 GMT Subject: Withdrawn: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From simonis at openjdk.java.net Thu Nov 25 10:57:11 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 10:57:11 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> References: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> Message-ID: On Thu, 25 Nov 2021 09:12:42 GMT, Christian Hagedorn wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > src/hotspot/share/interpreter/interpreterRuntime.cpp line 834: > >> 832: THREAD); >> 833: >> 834: if(HAS_PENDING_EXCEPTION) { > > Missing space Fixed > src/hotspot/share/opto/parseHelper.cpp line 301: > >> 299: >> 300: #endif >> 301: > > This line was probably deleted by mistake? The line was actually deleted by my editor. I first wondered myself, but the file had an extra empty line et the end which I think we discourage. So at the end I think my editor was right :) But as there remained no other changes in that file except the deleted empty line I agree that it looks strange now and I'll restore the file to its initial state. > src/hotspot/share/runtime/deoptimization.hpp line 436: > >> 434: >> 435: static jint total_deoptimization_count(); >> 436: static jint deoptimization_count(const char *reason_str, const char *action_str); > > Nit: Asterisk should be at the type: `const char* reason_str` Fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From mdoerr at openjdk.java.net Thu Nov 25 11:01:06 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 25 Nov 2021 11:01:06 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 16:33:35 GMT, Volker Simonis wrote: > `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. > > `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. > > The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. Nice change! Looks good to me besides what was already said. src/hotspot/share/opto/graphKit.cpp line 3342: > 3340: // A non-null value will always produce an exception. > 3341: if (!objtp->maybe_null()) { > 3342: bool aastore = (java_bc() == Bytecodes::_aastore); better: is_aastore ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 11:01:07 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 11:01:07 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> References: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> Message-ID: <9QL7aEAuYwXuPda-c9w-jJwZ787hAIIj0OBsf5c6K8I=.c3c158fb-a62b-450a-9676-9e072631b585@github.com> On Thu, 25 Nov 2021 09:58:19 GMT, Christian Hagedorn wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 287: > >> 285: checkTwo(testMode, impExcp, lastException, throwImplicitException_m, invocations); >> 286: >> 287: // Invoke compiled (or interpreted if JDK-8275908 isn't fixed) code 'Tier0InvokeNotifyFreq' times. > > As this is the fix for JDK-8275908, you can remove the comment about it :-) It's probably a leftover from JDK- > 8273563. Right :) Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 11:06:06 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 11:06:06 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 10:54:47 GMT, Martin Doerr wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > src/hotspot/share/opto/graphKit.cpp line 3342: > >> 3340: // A non-null value will always produce an exception. >> 3341: if (!objtp->maybe_null()) { >> 3342: bool aastore = (java_bc() == Bytecodes::_aastore); > > better: is_aastore Fixed both occurrences. ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 11:11:08 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 11:11:08 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> References: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> Message-ID: On Thu, 25 Nov 2021 10:14:42 GMT, Christian Hagedorn wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > test/lib/sun/hotspot/WhiteBox.java line 321: > >> 319: return getMethodCompilationLevel0(method, isOsr); >> 320: } >> 321: public int getMethodDecompileCount(Executable method) { > > As this class is marked as `@Deprecated`, do we need to add the methods here as well? Unfortunately yes :( I first didn't but there are still tests using it and they'll get warning otherwise. To make matters worse, some of them parse the output, so I gave up and added the methods to `sun/hotspot/WhiteBox.java` as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 17:26:03 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 17:26:03 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> References: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> Message-ID: On Thu, 25 Nov 2021 09:29:12 GMT, Christian Hagedorn wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 141: > >> 139: private static void printCounters(TestMode testMode, ImplicitException impExcp, Method throwImplicitException_m, int invocations) { >> 140: System.out.println("testMode=" + testMode + " exception=" + impExcp + " invocations=" + invocations + "\n" + >> 141: "decompilecount=" + WB.getMethodDecompileCount(throwImplicitException_m) + " " + > > `getMethodDecompileCount()` seems only to be used to print the counters here but is not verified otherwise. If it is is not too complicated, could a specific test for it be added as well? Hm, it was used in JDK-8275908 before this fix. Now, with this fix we don't get any recompiles any more :) The tests is already quite big so I've added a separate test `test/hotspot/jtreg/compiler/uncommontrap/Decompile.java` to verify the new WhiteBox methods introduced by this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 17:45:08 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 17:45:08 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> References: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> Message-ID: <6SeGNDIxY4gsishHn0dj2OZVwqH_jiDgxtC0HeviSTs=.13441674-bea2-48d5-9cc0-5b5bd914af18@github.com> On Thu, 25 Nov 2021 09:45:56 GMT, Christian Hagedorn wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 166: > >> 164: // Checks after the JIT-compiled test method has been invoked 'PerBytecodeTrapLimit' times. >> 165: private static void checkTwo(TestMode testMode, ImplicitException impExcp, Exception ex, Method throwImplicitException_m, int invocations) { >> 166: > > If I see that correctly, `checkTwo`, `checkThree` and `checkFour` only differ in whether using `PerBytecodeTrapLimit` or `Tier0InvokeNotifyFreq` and could be merged together (if the omitted assertions in `checkThree` and `checkFour` for the exception message compared to `checkTwo` are valid to be added again). That's a good point, now that the tests have got a little simpler :) I'll have to add more cases for JDK-8273563 but I think it's just fair to leave it for that change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 17:51:45 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 17:51:45 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v2] In-Reply-To: References: Message-ID: <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com> > `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. > > `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. > > The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Simplified test OptimizeImplicitExceptions.java and added Decompile.java test. Includes minor fixes requested by Martin and Christian ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6541/files - new: https://git.openjdk.java.net/jdk/pull/6541/files/6d12c341..58a107db Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=00-01 Stats: 230 lines in 6 files changed: 164 ins; 48 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/6541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6541/head:pull/6541 PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Thu Nov 25 17:55:07 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 25 Nov 2021 17:55:07 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v2] In-Reply-To: References: Message-ID: <41ebcqQZPbOaMQk6xOSHbRKEbKRPrWRzvZZ-LAawZcM=.53f0048e-6d5c-46fb-9f64-1caa9b8906e9@github.com> On Thu, 25 Nov 2021 10:58:13 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified test OptimizeImplicitExceptions.java and added Decompile.java test. Includes minor fixes requested by Martin and Christian > > Nice change! Looks good to me besides what was already said. @TheRealMDoerr, @chhagedorn thanks a lot for the quick reviews. I hope I could address all your concerns and suggestions with my latest push. @chhagedorn: I'm especially happy that you like the tests. As all too often the effort for a good test is much higher than for the fix itself :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From duke at openjdk.java.net Fri Nov 26 07:41:24 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Fri, 26 Nov 2021 07:41:24 GMT Subject: RFR: 8277372: Add getters for BOT and card table members Message-ID: Changed the visibility, added getters and refactored the following: 1. Card Table Members 2. BOT members 3. ObjectStartArray block members ------------- Commit messages: - Initial patch Changes: https://git.openjdk.java.net/jdk/pull/6570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277372 Stats: 199 lines in 31 files changed: 40 ins; 11 del; 148 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From chagedorn at openjdk.java.net Fri Nov 26 09:23:18 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 09:23:18 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v2] In-Reply-To: <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com> References: <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com> Message-ID: On Thu, 25 Nov 2021 17:51:45 GMT, Volker Simonis wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Simplified test OptimizeImplicitExceptions.java and added Decompile.java test. Includes minor fixes requested by Martin and Christian Thanks for doing the changes, they look good to me! > @chhagedorn: I'm especially happy that you like the tests. As all too often the effort for a good test is much higher than for the fix itself :) Yes, I couldn't agree more to that. It's sometimes underestimated how much time that is needed to come up with a good test. So, I always appreciate the extra effort :) ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6541 From chagedorn at openjdk.java.net Fri Nov 26 09:23:19 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 09:23:19 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v2] In-Reply-To: References: <8LRhErgZM4EAfnBSnynflcCchsz3z8Ao9vADyEEY16w=.389b897d-87f9-44da-bb67-ad942daaafa9@github.com> Message-ID: On Thu, 25 Nov 2021 10:52:02 GMT, Volker Simonis wrote: >> src/hotspot/share/opto/parseHelper.cpp line 301: >> >>> 299: >>> 300: #endif >>> 301: >> >> This line was probably deleted by mistake? > > The line was actually deleted by my editor. I first wondered myself, but the file had an extra empty line et the end which I think we discourage. So at the end I think my editor was right :) > But as there remained no other changes in that file except the deleted empty line I agree that it looks strange now and I'll restore the file to its initial state. I agree that we should then remove this line to follow the convention but as you've said, it might not be justified to fix these things in otherwise untouched files. Maybe a thing to fix for the next one who edits this file ;) >> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 141: >> >>> 139: private static void printCounters(TestMode testMode, ImplicitException impExcp, Method throwImplicitException_m, int invocations) { >>> 140: System.out.println("testMode=" + testMode + " exception=" + impExcp + " invocations=" + invocations + "\n" + >>> 141: "decompilecount=" + WB.getMethodDecompileCount(throwImplicitException_m) + " " + >> >> `getMethodDecompileCount()` seems only to be used to print the counters here but is not verified otherwise. If it is is not too complicated, could a specific test for it be added as well? > > Hm, it was used in JDK-8275908 before this fix. Now, with this fix we don't get any recompiles any more :) > > The tests is already quite big so I've added a separate test `test/hotspot/jtreg/compiler/uncommontrap/Decompile.java` to verify the new WhiteBox methods introduced by this change. Thanks for the effort to add an extra extensive test for it, it looks good! >> test/hotspot/jtreg/compiler/exceptions/OptimizeImplicitExceptions.java line 166: >> >>> 164: // Checks after the JIT-compiled test method has been invoked 'PerBytecodeTrapLimit' times. >>> 165: private static void checkTwo(TestMode testMode, ImplicitException impExcp, Exception ex, Method throwImplicitException_m, int invocations) { >>> 166: >> >> If I see that correctly, `checkTwo`, `checkThree` and `checkFour` only differ in whether using `PerBytecodeTrapLimit` or `Tier0InvokeNotifyFreq` and could be merged together (if the omitted assertions in `checkThree` and `checkFour` for the exception message compared to `checkTwo` are valid to be added again). > > That's a good point, now that the tests have got a little simpler :) > > I'll have to add more cases for JDK-8273563 but I think it's just fair to leave it for that change. That sounds good. >> test/lib/sun/hotspot/WhiteBox.java line 321: >> >>> 319: return getMethodCompilationLevel0(method, isOsr); >>> 320: } >>> 321: public int getMethodDecompileCount(Executable method) { >> >> As this class is marked as `@Deprecated`, do we need to add the methods here as well? > > Unfortunately yes :( > I first didn't but there are still tests using it and they'll get warning otherwise. To make matters worse, some of them parse the output, so I gave up and added the methods to `sun/hotspot/WhiteBox.java` as well. Oh I see, that's indeed unfortunate. I wasn't aware of that. Then better leave them in until the entire class gets removed at some point. Thanks for the explanation! ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From rkennke at openjdk.java.net Fri Nov 26 09:49:15 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 26 Nov 2021 09:49:15 GMT Subject: Integrated: 8277417: C1 LIR instruction for load-klass In-Reply-To: References: Message-ID: On Thu, 18 Nov 2021 20:16:27 GMT, Roman Kennke wrote: > In C1, the load of a Klass* out of an object is currently identified by a load of type T_ADDRESS with offset oopDest::klass_offset_in_bytes(). When encountering such load, this may be decoded when +CompressedClassPointers. This is problematic and ugly: if we ever emit a T_ADDRESS load with offset 8 or 4 (== klass_offset_in_bytes) that is not a Klass*, we would attempt to decode the result. We have been lucky so far. > > Also, in Lilliput, we want to do something entirely different there, and need to be able to emit more complex code, possibly including runtime call. > > The change introduces a new C1 LIR opcode OpLoadKlass, and refactors the implementations in c1_LIRAssembler_xyz.cpp to emit the code there, instead of mem2reg(). Notice that I could not test anything but x86, all other platforms only received very basic testing via GHA. It would be nice if respective maintainers could give it a try. > > Testing: > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) This pull request has now been integrated. Changeset: 99e4bda3 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/99e4bda303f2c71972a125d0ecaf4cf986c8614a Stats: 189 lines in 11 files changed: 140 ins; 37 del; 12 mod 8277417: C1 LIR instruction for load-klass Reviewed-by: iveresov, mdoerr, ngasson, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/6464 From duke at openjdk.java.net Fri Nov 26 10:33:34 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Fri, 26 Nov 2021 10:33:34 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: Refactoring in hotspot/cpu dir ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6570/files - new: https://git.openjdk.java.net/jdk/pull/6570/files/1ffc2d3d..bb85aa48 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=00-01 Stats: 21 lines in 9 files changed: 0 ins; 0 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From simonis at openjdk.java.net Fri Nov 26 11:13:29 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 26 Nov 2021 11:13:29 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v3] In-Reply-To: References: <1sOmEVcu-uqZ5EIzL95kbSfMUiioSEwVXRkYEZSbYMs=.7805f384-e601-43e5-9ada-b070475bedf5@github.com> Message-ID: On Fri, 26 Nov 2021 09:20:09 GMT, Christian Hagedorn wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Decompile.java test for non-JVMCI builds > > Thanks for doing the changes, they look good to me! > >> @chhagedorn: I'm especially happy that you like the tests. As all too often the effort for a good test is much higher than for the fix itself :) > > Yes, I couldn't agree more to that. It's sometimes underestimated how much time that is needed to come up with a good test. So, I always appreciate the extra effort :) Thanks for the approval @chhagedorn . I found that the new `Decompile.java` test failed on linux/x86_32. The reason for this is that 32-bit builds don't include JVMCI and without JMVCI the bimorphic inlining trap is called just `bimorphic` (in contrast to `bimorphic_or_optimized_type_check` for JVMCI builds`). The fix is trivial and I hope your approval is also valid for the latest version :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Fri Nov 26 11:13:29 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 26 Nov 2021 11:13:29 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v3] In-Reply-To: References: Message-ID: > `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. > > `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. > > The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fix Decompile.java test for non-JVMCI builds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6541/files - new: https://git.openjdk.java.net/jdk/pull/6541/files/58a107db..c8564d08 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6541&range=01-02 Stats: 11 lines in 1 file changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6541/head:pull/6541 PR: https://git.openjdk.java.net/jdk/pull/6541 From mdoerr at openjdk.java.net Fri Nov 26 11:28:03 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 26 Nov 2021 11:28:03 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v3] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 11:13:29 GMT, Volker Simonis wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix Decompile.java test for non-JVMCI builds Right, non-JVMCI platforms need this fix (also PPC and s390). LGTM, now. We'll retest it. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6541 From chagedorn at openjdk.java.net Fri Nov 26 11:50:08 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 26 Nov 2021 11:50:08 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v3] In-Reply-To: References: Message-ID: <5wWSpqtrhuJgNpKzSkpZcEEwcO3UXv9ZzJspyVu96zo=.32f86314-d08f-4a37-8e39-3ed58da05c18@github.com> On Fri, 26 Nov 2021 11:13:29 GMT, Volker Simonis wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix Decompile.java test for non-JVMCI builds Good catch! Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6541 From lkorinth at openjdk.java.net Fri Nov 26 12:35:03 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 26 Nov 2021 12:35:03 GMT Subject: RFR: 8269537: memset() is called after operator new [v4] In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 09:36:38 GMT, Leo Korinth wrote: >> The basic problem is that we are relying on undefined behaviour, as documented in the code: >> >> // This whole business of passing information from ResourceObj::operator new >> // to the ResourceObj constructor via fields in the "object" is technically UB. >> // But it seems to work within the limitations of HotSpot usage (such as no >> // multiple inheritance) with the compilers and compiler options we're using. >> // And it gives some possibly useful checking for misuse of ResourceObj. >> >> >> I am removing the undefined behaviour by passing the type of allocation through a thread local variable. >> >> This solution has some advantages: >> 1) it is not UB >> 2) it is simpler and easier to understand >> 3) it uses less memory (I could make it use even less if I made the enum `allocation_type` a u8) >> 4) in the *very* unlikely situation that stack memory (or embedded) already equals the data calculated from the address of the object, the code will also work. >> >> When doing the change, I also updated `allocated_on_stack()` to the new name `allocated_on_stack_or_embedded()` which is much harder to misinterpret. >> >> I also disallow to "fake" the memory type by explicitly calling `ResourceObj::set_allocation_type`. >> >> This forced me to change two places that is faking the allocation type of an embedded `GrowableArray` from `STACK_OR_EMBEDDED` to `C_HEAP`. The faking of the type is hard to understand as a `STACK_OR_EMBEDDED` `GrowableArray` can allocate any type of object. My guess is that `GrowableArray` has changed behaviour, or maybe that it was hard to understand because the old naming of `allocated_on_stack()`. >> >> I have also tried to update the comments. In doing that I not only changed the comments for this change, but also for the *incorrect* advice to always delete object you allocate with new. >> >> Testing on debug build tier1-3 >> Testing on release build tier1 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > review updates This comment will keep this pull request alive a bit longer. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From simonis at openjdk.java.net Fri Nov 26 16:24:13 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 26 Nov 2021 16:24:13 GMT Subject: Integrated: 8275908: Record null_check traps for calls and array_check traps in the interpreter In-Reply-To: References: Message-ID: <53AX6UcR3p-pozMTAUGFL5x080jMy7BYYRF0Qa1ats8=.2035356c-1c61-4e9d-8985-29eb5fad4847@github.com> On Wed, 24 Nov 2021 16:33:35 GMT, Volker Simonis wrote: > `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. > > `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. > > The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. This pull request has now been integrated. Changeset: 40fef231 Author: Volker Simonis URL: https://git.openjdk.java.net/jdk/commit/40fef2311c95eca0ec34652f9fc0e56b827b8380 Stats: 637 lines in 9 files changed: 628 ins; 1 del; 8 mod 8275908: Record null_check traps for calls and array_check traps in the interpreter Reviewed-by: chagedorn, mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From simonis at openjdk.java.net Fri Nov 26 16:24:10 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Fri, 26 Nov 2021 16:24:10 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v3] In-Reply-To: <5wWSpqtrhuJgNpKzSkpZcEEwcO3UXv9ZzJspyVu96zo=.32f86314-d08f-4a37-8e39-3ed58da05c18@github.com> References: <5wWSpqtrhuJgNpKzSkpZcEEwcO3UXv9ZzJspyVu96zo=.32f86314-d08f-4a37-8e39-3ed58da05c18@github.com> Message-ID: <9fnUl67C4gTEG1KUH1p3pEwamphpZy4XB91ShOxyVO0=.7ddfd8d5-04a9-4960-b980-78a9e8d19bdf@github.com> On Fri, 26 Nov 2021 11:47:11 GMT, Christian Hagedorn wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Decompile.java test for non-JVMCI builds > > Good catch! Looks good. Thanks @chhagedorn, @TheRealMDoerr ! ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From dholmes at openjdk.java.net Sat Nov 27 05:00:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 27 Nov 2021 05:00:12 GMT Subject: RFR: 8275908: Record null_check traps for calls and array_check traps in the interpreter [v3] In-Reply-To: References: Message-ID: <41uVcf2dhmwZ_JLdSZz-njNJYeWpG_VYiJ_5W4vM-fA=.f0fccd47-82ea-4758-87d2-3832ebe083d9@github.com> On Fri, 26 Nov 2021 11:13:29 GMT, Volker Simonis wrote: >> `null_checks` occurring at invoke bytecodes are currently not recorded by the profiler. This leads to unnecessary uncommon traps, deoptimizations and recompilations for exceptions which already occurred before the compilation (i.e. are "hot"). This change fixes the problem in the interpreter. >> >> `array_checks` are currently recorded as `class_checks` in the interpreter and therefore not recognized by the compiler. This again leads to uncommon traps, deoptimizations and recompilations. This change unifies the handling of `array_checks` in the interpreter and compiler and prevents unnecessary recompilation. >> >> The test is a stripped down version of a test which was developed for [JDK-8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow](https://bugs.openjdk.java.net/browse/JDK-8273563) (still [under review](https://github.com/openjdk/jdk/pull/5488)). It introduces an extension to the Whitebox API to expose the decompile, deopt and trap counters which is also required for testing [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563). I think (and hope) it will also be helpful for others in the future. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Fix Decompile.java test for non-JVMCI builds The new tests are not written correctly and are causing failures in our tier3 CI runs. If you set an explicit GC on the @run line then you have to ensure no GC option is passed in when running jtreg, else you get two GC's requested and the test fails to run. https://bugs.openjdk.java.net/browse/JDK-8277878 ------------- PR: https://git.openjdk.java.net/jdk/pull/6541 From duke at openjdk.java.net Sat Nov 27 10:01:07 2021 From: duke at openjdk.java.net (duke) Date: Sat, 27 Nov 2021 10:01:07 GMT Subject: Withdrawn: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From tschatzl at openjdk.java.net Sat Nov 27 12:13:03 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Sat, 27 Nov 2021 12:13:03 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 10:33:34 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring in hotspot/cpu dir @tstuefe : can you check whether the s390 and ppc changes still compile? The changes look straightforward enough, but... Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From stuefe at openjdk.java.net Sat Nov 27 14:30:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 27 Nov 2021 14:30:04 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: On Sat, 27 Nov 2021 12:10:17 GMT, Thomas Schatzl wrote: >> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactoring in hotspot/cpu dir > > @tstuefe : can you check whether the s390 and ppc changes still compile? The changes look straightforward enough, but... > > Thanks, > Thomas @tschatzl We have s390 + ppcle builds now in GHAs thanks to Alexey, and they do look fine (https://github.com/openjdk/jdk/pull/6570/checks?check_run_id=4341321057). Seeing how simple the platform changes are, I think this is okay. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From jbhateja at openjdk.java.net Sat Nov 27 14:52:04 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 27 Nov 2021 14:52:04 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v4] In-Reply-To: References: Message-ID: On Thu, 25 Nov 2021 05:08:42 GMT, David Holmes wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Override threshold only if flag is default > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1893: > >> 1891: return AVX3Threshold; >> 1892: } >> 1893: } > > I am somewhat concerned about the overhead of evaluating this each time it is used. I realize these will only be startup costs while generating the stubs, not part of the stubs themselves, but it still may be a startup impact. Can you run a startup benchmark to see if there is any problem? > > I was also thinking the more direct formulation would just be: > ```return (is_intel_family_core() && supports_serialize() && FLAG_IS_DEFAULT(AVX3Threshold)) ? 0 : AVX3Threshold;``` Hi @sviswa7 agree with @dholmes-ora , instead of calling multiple times in a stub can we not call it only once per stub? Since stubs are assembled once and not relocated hence it should be okay to call this method only once for stubs which are going to benefit form this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From tobias.hartmann at oracle.com Mon Nov 29 09:01:05 2021 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Nov 2021 10:01:05 +0100 Subject: [External] : RFC: improving NMethod code locality in CodeCache In-Reply-To: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> References: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> Message-ID: <2e87598c-fb18-5189-8cb6-5e791133161f@oracle.com> Hi Evgeny, Thanks for sharing these results and starting the discussion. Some comments below. On 23.11.21 18:34, Astigeevich, Evgeny wrote: > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. Is it really a problem with branch prediction or more with instruction caching? With the current implementation, the hot instructions of a single nmethod are already contiguous but different nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. It would definitely be nice to have this as an option (rather than replacing the current implementation) but I wonder how feasible it is. There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. > According to the fixed JDK-8152664 (https://bugs.openjdk.java.net/browse/JDK-8152664) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://mail.openjdk.java.net/pipermail/hotspot-dev/2016-April/022500.html. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. Ever since I finished the implementation of the Segmented Code Cache (https://openjdk.java.net/jeps/197), I wanted to work on this but never got to it. I think that the additional complexity in the code cache is worth it but of course that has to be proven by a performance evaluation. For reference, here's my old thesis and the paper we published back then: http://cr.openjdk.java.net/~thartmann/papers/2014-Code_Cache_Optimizations-thesis.pdf http://cr.openjdk.java.net/~thartmann/papers/2014-PPPJ-Efficient_Code_Cache_Management.pdf > There is JDK-7072317 ?move metadata from CodeCache? (https://bugs.openjdk.java.net/browse/JDK-7072317) which the implementation works can be done under. Yes, that makes sense. > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. I would say that from a performance perspective, only the main code matters because the stubs are used for slow paths. If it simplifies prototyping, I would go with b) first. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality between different nmethods. Solution b) would only improve code locality in the same nmethod but the overall layout of executable code in the code cache would still be sparse. I think c) would be the ideal solution: The code cache would only contain executable code and all the metadata would be somewhere else. But solution a) would lead to the same layout and might be easier to implement. > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. Yes, that is a concern. A thorough performance evaluation is required. > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Hope that helps. I'm curious what others think. Best regards, Tobias From tschatzl at openjdk.java.net Mon Nov 29 09:52:09 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 29 Nov 2021 09:52:09 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: On Fri, 26 Nov 2021 10:33:34 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring in hotspot/cpu dir I will push it through our testing, particular the changes for the SA agent (in `vmstructs_gc.hpp`) are always good to double-check. src/hotspot/share/gc/shared/blockOffsetTable.hpp line 56: > 54: static uint _LogN_words; > 55: static uint _N_bytes; > 56: static uint _N_words; The `private` visibility modifier can be removed as this is default at the top of a class. The static variables should start with a lower case letter after the underscore, something like `_log_n`. My suggestion would also be to change `N`/`n` to something more understandable, like `size`, and add `block`, i.e. something like `_log_block_size`, `_log_block_size_in_words` similar to the corresponding `CardTable` members etc. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6570 From duke at openjdk.java.net Mon Nov 29 12:12:21 2021 From: duke at openjdk.java.net (xpbob) Date: Mon, 29 Nov 2021 12:12:21 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr Message-ID: Unsafe is used in many Java frameworks. When the framework has a unsafe memory leak , there is no way to know what code is causing it. Add unsafe allocation event to jfr. Records the size and stack allocated. This event is off by default ------------- Commit messages: - 8277930: Add unsafe allocation event to jfr Changes: https://git.openjdk.java.net/jdk/pull/6591/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277930 Stats: 18 lines in 4 files changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From lutz.schmidt at sap.com Mon Nov 29 12:19:09 2021 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 29 Nov 2021 12:19:09 +0000 Subject: [External] : RFC: improving NMethod code locality in CodeCache In-Reply-To: <2e87598c-fb18-5189-8cb6-5e791133161f@oracle.com> References: <18BB091D-7983-48B1-BD0D-A333D8B81226@amazon.com> <2e87598c-fb18-5189-8cb6-5e791133161f@oracle.com> Message-ID: <3C21E64B-42BA-4818-8623-6B518E00D97D@sap.com> Hi, a few thoughts immediately popped up when reading Evgeny's RFC and Tobias' comments. If my comments seem influenced by s390x - that might well be. It's the architecture I know best. - The biggest concern I have relates to pc-relative addressing. o nmethod constants are currently located next to the instruction section. Putting them into a separately allocated area may break the pc-relative limit. s390x limit: +/- 4GB, no fallback implemented. o relative branches either are + short distance, mostly intra-nmethod + long distance, mostly inter-nmethod + not possible in general, e.g., runtime calls The branch optimization (in shorten_branches) might less often be possible. One example would be if stub code is moved to a separately allocated area. - When considering performance, it is beneficial to have data which is being patched (frequently) separated from the instruction stream. s390x: never modify data in a cache line where instructions are fetched from. That will kill your performance big time. - I'm not a branch prediction expert. Instruction stream compactness may have an influence if the prediction engine not only remembers the branch direction, but the (limited length) distance as well. Thanks, Lutz ?On 29.11.21, 10:03, "hotspot-dev on behalf of Tobias Hartmann" wrote: Hi Evgeny, Thanks for sharing these results and starting the discussion. Some comments below. On 23.11.21 18:34, Astigeevich, Evgeny wrote: > We have cases where the CodeCache contains more than 15,000 compiled methods. In these cases, we saw a negative performance effect. The hot executable code is not contiguous, so branch prediction hardware can become overloaded. Is it really a problem with branch prediction or more with instruction caching? With the current implementation, the hot instructions of a single nmethod are already contiguous but different nmethods might be located far away (and there's lots of metadata in-between). (Re-)moving the metadata will improve locality but does that really have an effect on branch prediction? Did you gather some numbers via hardware performance counters (iCache, ITLB, branch prediction misses)? > The data show that due to intervening non-executable data in NMethods, executable code is sparse in the CodeCache. The data also show the most contributors of non-executable data are the header and scopes sections. Arm64 vs x86_64 looks consistent except the stub code. On arm64 the size of the stub code is 4-5 times bigger. > > We?d like to have an option to configure the CodeCache to support C2 nmethods with separated executable code and non-executable data. It would definitely be nice to have this as an option (rather than replacing the current implementation) but I wonder how feasible it is. There is lots of code that depends on the current layout and we would need to make all of that dependent on a flag. > According to the fixed JDK-8152664 (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8152664&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0j0bCjbCv7AQH1uULiERMIcfUWaTWzh%2FIJbKuMO70Ow%3D&reserved=0) ?Support non-continuous CodeBlobs in HotSpot?, NMethod sections can be located in different places of memory. The discussion of it: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-dev%2F2016-April%2F022500.html&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4bXS2plxpknWzKwY9qdJl%2BTGEHiwV1LgMnIkHGwkG8A%3D&reserved=0. Separating code will complicate maintenance of the CodeCache. Different parts of memory for a nmethod need to be allocated/released. Ever since I finished the implementation of the Segmented Code Cache (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F197&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063133916%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ylfS6p71bpm7XmNRfG0vjSw6ZqRPOoJvSRujzYkQz8g%3D&reserved=0), I wanted to work on this but never got to it. I think that the additional complexity in the code cache is worth it but of course that has to be proven by a performance evaluation. For reference, here's my old thesis and the paper we published back then: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-Code_Cache_Optimizations-thesis.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8KgOtwbSULPN%2FlUz10%2B9itGl%2Fmmvm6bV4y6D%2BcsT%2Bu4%3D&reserved=0 https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2Fpapers%2F2014-PPPJ-Efficient_Code_Cache_Management.pdf&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gDYHJdpnK1%2FgcxDGZsYJ0X0Ku%2BIwS9KWrk8ggSfUVt0%3D&reserved=0 > There is JDK-7072317 ?move metadata from CodeCache? (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7072317&data=04%7C01%7Clutz.schmidt%40sap.com%7C17b6b19707b845d65b6308d9b316d9b6%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C637737734063143871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=p6sjPC9HXMlydHk5mi4DlQh2ZOG4MYvcLte%2FAz%2B3ZbU%3D&reserved=0) which the implementation works can be done under. Yes, that makes sense. > There can be different approaches for the implementation: > > 1. What to separate: > a. All code (main plus stub) from other sections. > b. Or only main code because this is the code where an application should spend most of the time. > c. Or the header and scope sections. I would say that from a performance perspective, only the main code matters because the stubs are used for slow paths. If it simplifies prototyping, I would go with b) first. > 2. Where to put: > a. Different segments for code and nmethod data. This will require updating NMethod because it uses code_offset, stub_offset from header_begin. > b. The same segment but in a different part (e.g., code grows from lower addresses upwards and metadata from high addresses downwards). This might allow keeping NMethod using code_offset, stub_offset. > c. Or in a completely different place (C-heap, Metaspace,...) It depends on what we want to improve: (i) Code locality in the same nmethod or (ii) code locality between different nmethods. Solution b) would only improve code locality in the same nmethod but the overall layout of executable code in the code cache would still be sparse. I think c) would be the ideal solution: The code cache would only contain executable code and all the metadata would be somewhere else. But solution a) would lead to the same layout and might be easier to implement. > It needs to be investigated if the separation of sections which are frequently accessed during the normal execution of the code (e.g., oop section) affects the performance negatively. We might need to change NMethodSweeper to preserve the code locality property. Yes, that is a concern. A thorough performance evaluation is required. > We would like to get feedback on the above approaches (or something different) before implementing JDK-7072317. Hope that helps. I'm curious what others think. Best regards, Tobias From duke at openjdk.java.net Mon Nov 29 13:02:09 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Mon, 29 Nov 2021 13:02:09 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 09:41:54 GMT, Thomas Schatzl wrote: >> Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactoring in hotspot/cpu dir > > src/hotspot/share/gc/shared/blockOffsetTable.hpp line 56: > >> 54: static uint _LogN_words; >> 55: static uint _N_bytes; >> 56: static uint _N_words; > > The `private` visibility modifier can be removed as this is default at the top of a class. > The static variables should start with a lower case letter after the underscore, something like `_log_n`. > > My suggestion would also be to change `N`/`n` to something more understandable, like `size`, and add `block`, i.e. something like `_log_block_size`, `_log_block_size_in_words` similar to the corresponding `CardTable` members etc. > > Edit: note that "block" isn't a good word to use here, so scratch that - "block" is any kind of area that is more generic than an object, but does not refer to the BOT entry. As I can understand, we need to replace "N" with something meaningful. Does something like "entry_size" or "bot_entry_size" would work? ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From zgu at openjdk.java.net Mon Nov 29 14:04:11 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 29 Nov 2021 14:04:11 GMT Subject: RFR: 8277797: Remove undefined/unused SharedRuntime::trampoline_size() In-Reply-To: <_mUzi5C5miYGHHXIwY2Hv3tCeZ1676_hZH9raqFn4Gw=.2d265a64-c041-4f73-a775-76d7f09470a3@github.com> References: <_mUzi5C5miYGHHXIwY2Hv3tCeZ1676_hZH9raqFn4Gw=.2d265a64-c041-4f73-a775-76d7f09470a3@github.com> Message-ID: On Thu, 25 Nov 2021 05:14:59 GMT, David Holmes wrote: >> A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002). > > Good and trivial. > > Thanks, > David Thanks, @dholmes-ora @tstuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/6540 From zgu at openjdk.java.net Mon Nov 29 14:04:11 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 29 Nov 2021 14:04:11 GMT Subject: Integrated: 8277797: Remove undefined/unused SharedRuntime::trampoline_size() In-Reply-To: References: Message-ID: On Wed, 24 Nov 2021 15:54:47 GMT, Zhengyu Gu wrote: > A trivial patch to remove undefined and unused `SharedRuntime::trampoline_size()`, a leftover from [JDK-8263002](https://bugs.openjdk.java.net/browse/JDK-8263002). This pull request has now been integrated. Changeset: 05ab1767 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/05ab1767684bee0a3b8c8214c610beafaad058f9 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8277797: Remove undefined/unused SharedRuntime::trampoline_size() Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/6540 From shade at openjdk.java.net Mon Nov 29 14:10:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 29 Nov 2021 14:10:54 GMT Subject: RFR: 8277893: Arraycopy stress tests Message-ID: I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. A brief tour of these tests: - Tests all data types; - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. Test times: # x86_64 (TR 3970X) real 9m11.037s user 78m2.766s sys 0m19.873s # x86_32 (TR 3970X) real 13m39.054s user 147m38.308s sys 0m10.924s # x86_64 (i5-11500) real 41m32.622s user 447m19.986s sys 0m21.026s # AArch64 (ThunderX2) real 5m34.210s user 45m16.015s sys 0m24.723s Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. Additional testing: - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` ------------- Commit messages: - Ready for review Changes: https://git.openjdk.java.net/jdk/pull/6594/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6594&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277893 Stats: 1181 lines in 12 files changed: 1181 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594 PR: https://git.openjdk.java.net/jdk/pull/6594 From jvernee at openjdk.java.net Mon Nov 29 14:42:03 2021 From: jvernee at openjdk.java.net (Jorn Vernee) Date: Mon, 29 Nov 2021 14:42:03 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 12:06:02 GMT, xpbob wrote: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default An event like this would help to find allocation sites. I'd suggest also adding an event for `Unsafe::freeMemory`, as well as recording the memory address in both event types. With that, it should be possible to match up allocations with frees, and leaks could be identified by looking for allocations that don't have a corresponding free with the same address. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From duke at openjdk.java.net Mon Nov 29 14:52:17 2021 From: duke at openjdk.java.net (Scott Gibbons) Date: Mon, 29 Nov 2021 14:52:17 GMT Subject: RFR: 8277358: Accelerate CRC32-C Message-ID: Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. 5986.947899319073 MB/s => 24041.05203089616 MB/s 5840.02689336947 MB/s => 24898.781468710356 MB/s ********** Original *********** scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 offset = 0 msgSize = 512 bytes iters = 20000000 ------------------------------------------------------- CRCs: crc = ae10ee5a, crcReference = ae10ee5a CRC32C.update(byte[]) runtime = 1.710387358 seconds CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s CRCs: crc = ae10ee5a, crcReference = ae10ee5a ------------------------------------------------------- CRCs: crc = ae10ee5a, crcReference = ae10ee5a CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s CRCs: crc = ae10ee5a, crcReference = ae10ee5a ------------------------------------------------------- *********** With my changes: ************* scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 offset = 0 msgSize = 512 bytes iters = 20000000 ------------------------------------------------------- CRCs: crc = ae10ee5a, crcReference = ae10ee5a CRC32C.update(byte[]) runtime = 0.425938099 seconds CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s CRCs: crc = ae10ee5a, crcReference = ae10ee5a ------------------------------------------------------- CRCs: crc = ae10ee5a, crcReference = ae10ee5a CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s CRCs: crc = ae10ee5a, crcReference = ae10ee5a ------------------------------------------------------- ------------- Commit messages: - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c - Merge branch 'master' into asgibbons-crc32c - Asgibbons crc32c (#7) - Merge branch 'openjdk:master' into master - Revert .gitignore change - Move register save to within conditional; add comments - Bad merge. - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c - ZZMerge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c - Use existing CRC32 code with different table for CRC32-C - ... and 203 more: https://git.openjdk.java.net/jdk/compare/e9b36a83...10aeaec6 Changes: https://git.openjdk.java.net/jdk/pull/6595/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277358 Stats: 62 lines in 4 files changed: 40 ins; 1 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/6595.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595 PR: https://git.openjdk.java.net/jdk/pull/6595 From tschatzl at openjdk.java.net Mon Nov 29 16:55:08 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 29 Nov 2021 16:55:08 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v2] In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 12:58:58 GMT, Vishal Chand wrote: >> src/hotspot/share/gc/shared/blockOffsetTable.hpp line 56: >> >>> 54: static uint _LogN_words; >>> 55: static uint _N_bytes; >>> 56: static uint _N_words; >> >> The `private` visibility modifier can be removed as this is default at the top of a class. >> The static variables should start with a lower case letter after the underscore, something like `_log_n`. >> >> My suggestion would also be to change `N`/`n` to something more understandable, like `size`, and add `block`, i.e. something like `_log_block_size`, `_log_block_size_in_words` similar to the corresponding `CardTable` members etc. >> >> Edit: note that "block" isn't a good word to use here, so scratch that - "block" is any kind of area that is more generic than an object, but does not refer to the BOT entry. > > As I can understand, we need to replace "N" with something meaningful. Does something like "entry_size" or "bot_entry_size" would work? I would think that `bot_entry_size` is one byte. Probably "bot_card_size"? ------------- PR: https://git.openjdk.java.net/jdk/pull/6570 From ddong at openjdk.java.net Mon Nov 29 17:47:18 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 29 Nov 2021 17:47:18 GMT Subject: RFR: 8277948: AArch64: Print the correct stack if -XX:+PreserveFramePointer when crash Message-ID: Hi, I found that the native stack frames in the hs log are not accurate sometimes on AArch64, not sure if this is a known issue or an issue worth fixing. The following steps can quick reproduce the problem: 1. apply the diff(comment the dtrace_object_alloc call in interpreter and make a crash on SharedRuntime::dtrace_object_alloc) index 39e99bdd5ed..4fc768e94aa 100644 --- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp @@ -3558,6 +3558,7 @@ void TemplateTable::_new() { __ store_klass_gap(r0, zr); // zero klass gap for compressed oops __ store_klass(r0, r4); // store klass last +/** { SkipIfEqual skip(_masm, &DTraceAllocProbes, false); // Trigger dtrace event for fastpath @@ -3567,6 +3568,7 @@ void TemplateTable::_new() { __ pop(atos); // restore the return value } +*/ __ b(done); } diff --git a/src/hotspot/cpu/x86/templateTable_x86.cpp b/src/hotspot/cpu/x86/templateTable_x86.cpp index 19530b7c57c..15b0509da4c 100644 --- a/src/hotspot/cpu/x86/templateTable_x86.cpp +++ b/src/hotspot/cpu/x86/templateTable_x86.cpp @@ -4033,6 +4033,7 @@ void TemplateTable::_new() { Register tmp_store_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg); __ store_klass(rax, rcx, tmp_store_klass); // klass +/** { SkipIfEqual skip_if(_masm, &DTraceAllocProbes, 0); // Trigger dtrace event for fastpath @@ -4041,6 +4042,7 @@ void TemplateTable::_new() { CAST_FROM_FN_PTR(address, static_cast(SharedRuntime::dtrace_object_alloc)), rax); __ pop(atos); } +*/ __ jmp(done); } diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp index a5de65ea5ab..60b4bd3bcc8 100644 --- a/src/hotspot/share/runtime/sharedRuntime.cpp +++ b/src/hotspot/share/runtime/sharedRuntime.cpp @@ -1002,6 +1002,7 @@ jlong SharedRuntime::get_java_tid(Thread* thread) { * 6254741. Once that is fixed we can remove the dummy return value. */ int SharedRuntime::dtrace_object_alloc(oopDesc* o) { + *(int*)0 = 1; return dtrace_object_alloc(Thread::current(), o, o->size()); } 2. `java -XX:+DTraceAllocProbes -Xcomp -XX:-PreserveFramePointer -version` On x86_64, the native stack in hs log is complete, but in AArch64, the native stack is incorrect. In the beginning, I thought it might be the influence of PreserveFramePointer. Later, I found that no matter whether PreserveFramePointer is enabled or not, in the hs log of x86_64, the native stack is always correct, and aarch64 is wrong. After some investigation, I found that this problem is related to the layout of the stack. On x86_64, whether it is C/C++, interpreter, or JIT, `callee` will always put the `return address` and `fp` of the `caller` at the bottom of the stack. Hence, `callee` can always get the `caller sp`(aka `sender sp`) by `fp + 2`, and if `caller` is a compiled method, `caller sp` is the key to getting the `caller`'s `caller` since `caller fp` may be invalid.(see frame::sender_for_compiled_frame). push %rbp mov %rsp,%rbp _ _ _ _ _ _ | | | | | |_ _ _ _ _ _| | | | | caller | | <- caller sp | _ _ _ |_ _ _ _ _ _| | expand | | | | ret addr | | direction callee |_ _ _ _ _ _| | | | V | caller fp | <- fp |_ _ _ _ _ _| But for AArch64, the C/C++ code doesn't put the `return address` and `fp` of the `caller` at the bottom of the stack. Hence, we cannot use `fp + 2` to calculate the proper `caller sp`(although it is still implemented this way). When `caller` is a C1/C2 method A, and `callee` a C/C++ method B, we cannot get the `caller` of A since we cannot get the proper sp value of it. stp x29, x30, [sp, #-N]! mov x29, sp _ _ _ _ _ _ | | | | | |_ _ _ _ _ _| | | | | caller | | <- caller sp | _ _ _ |_ _ _ _ _ _| - | expand | | . . . . . | | direction _ _ _ _ _ _ | | | | | N | | ret addr | | | callee |_ _ _ _ _ _| | | | | - V | caller fp | <- fp |_ _ _ _ _ _| I am not very familiar with AArch64 and have no idea how to fix this issue perfectly at current. Based on my understanding of the implementation, we can get the correct stack trace when PreserveFramePointer is enabled. Although PreserveFramePointer is disabled by default, I found that some real applications will enable it in the production environment. Therefore, in my opinion, this fix can help troubleshoot crash issues in applications that enable PreserveFramePointer on AArch64 platform. This patch changes the logic of l_sender_sp calculation, uses sender_sp() as the value of l_sender_sp when PreserveFramePointer is enabled. Any input is appreciated. Thanks, Denghui ------------- Commit messages: - 8277948: AArch64: Print the correct stack if -XX:+PreserveFramePointer when crash Changes: https://git.openjdk.java.net/jdk/pull/6597/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6597&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8277948 Stats: 13 lines in 4 files changed: 11 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/6597.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6597/head:pull/6597 PR: https://git.openjdk.java.net/jdk/pull/6597 From sviswanathan at openjdk.java.net Mon Nov 29 23:30:36 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Mon, 29 Nov 2021 23:30:36 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v5] In-Reply-To: References: Message-ID: <5Dmjboa1Vh9PwwZnwfiTpmrXSm0D7sNMMm92bufbGSM=.d8494927-8a16-4efb-84b7-15086809d13f@github.com> > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/021bc659..b44b63ed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=03-04 Stats: 17 lines in 3 files changed: 2 ins; 3 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 30 00:10:39 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 30 Nov 2021 00:10:39 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: > Currently 32-byte instructions are used for small array copy and clear. > This can be optimized by using 64-byte instructions. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6512/files - new: https://git.openjdk.java.net/jdk/pull/6512/files/b44b63ed..190f974c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6512&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6512.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6512/head:pull/6512 PR: https://git.openjdk.java.net/jdk/pull/6512 From sviswanathan at openjdk.java.net Tue Nov 30 00:43:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 30 Nov 2021 00:43:04 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs In-Reply-To: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> References: <1KoRjoyObIS32kwNcojcLdIdUkdqpL1Pon6-IIn-H94=.a986a7bb-a14b-4df8-9ab2-9c66650e6d1b@github.com> Message-ID: On Tue, 23 Nov 2021 06:49:07 GMT, David Holmes wrote: >> @dholmes-ora I have implemented your review comments. > > Sorry @sviswa7 but could you explain in the comment why/how `avx3_threshold` reporting zero impacts the use 64-byte load/store - the connection is not at all obvious for anyone not fully conversant with AVX3 and how it is used by the code. Thanks. @dholmes-ora @jatin-bhateja I have implemented your review comments. I have used the direct formulation for avx3_threshold() method as suggested by David. Reused the avx3_threshold() computation where possible as suggested by Jatin. The tier1-tier3 testing passed on the platform where avx3_threshold() returns 0. No additional observable overhead seen in SPECjvm2008 startup benchmarks on AVX512 platform. Please let me know if the patch looks ok to you. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From duke at openjdk.java.net Tue Nov 30 02:14:37 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 30 Nov 2021 02:14:37 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v2] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8277930 - 8277930: Add unsafe allocation event to jfr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/a30f3618..d883f62d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=00-01 Stats: 450 lines in 28 files changed: 164 ins; 169 del; 117 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From ddong at openjdk.java.net Tue Nov 30 02:40:06 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 30 Nov 2021 02:40:06 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 14:39:16 GMT, Jorn Vernee wrote: > I'd suggest also adding an event for `Unsafe::freeMemory`, as well as recording the memory address in both event types. With that, it should be possible to match up allocations with frees, and leaks could be identified by looking for allocations that don't have a corresponding free with the same address. Unsafe also support reallocating memory, which complexes the analysis of direct memory leak. And I think we need a mechanism to filter events by allocation size, but AFAIK, there is no general mechanism to achieve it at present. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From coleenp at openjdk.java.net Tue Nov 30 02:45:34 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 30 Nov 2021 02:45:34 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark Message-ID: This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. Testing tier1-6 in progress. ------------- Commit messages: - 8265150: AsyncGetCallTrace crashes on ResourceMark Changes: https://git.openjdk.java.net/jdk/pull/6606/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6606&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265150 Stats: 9 lines in 2 files changed: 0 ins; 3 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6606/head:pull/6606 PR: https://git.openjdk.java.net/jdk/pull/6606 From jiefu at openjdk.java.net Tue Nov 30 03:42:09 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 30 Nov 2021 03:42:09 GMT Subject: Withdrawn: 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph In-Reply-To: References: Message-ID: On Tue, 23 Nov 2021 15:59:00 GMT, Jie Fu wrote: > Hi all, > > `ShenandoahBarrierC2Support::verify_raw_mem` crashes due to `u->unique_ctrl_out()` [1] returns NULL for malformed control flow graph. > It can be reproduced by running `compiler/vectorapi/TestIntrinsicBailOut.java` with `-XX:+UseShenandoahGC`. > It would be better to fix it. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L1925 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6525 From jiefu at openjdk.java.net Tue Nov 30 03:42:09 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 30 Nov 2021 03:42:09 GMT Subject: RFR: 8277652: SIGSEGV in ShenandoahBarrierC2Support::verify_raw_mem for malformed control flow graph In-Reply-To: <81o2YKFQvTE2C9qqBBDBjC5L1dNyPMRTJw1CcTdD2SA=.6946cbb3-de08-46b7-9724-7c39a989efc3@github.com> References: <81o2YKFQvTE2C9qqBBDBjC5L1dNyPMRTJw1CcTdD2SA=.6946cbb3-de08-46b7-9724-7c39a989efc3@github.com> Message-ID: On Tue, 23 Nov 2021 16:20:56 GMT, Roman Kennke wrote: > Thank you, Jie! I am currently working on a change that would make LRB runtime call not consume or produce raw memory at all, and would obsolete your change. See #6526 . Thanks @rkennke for fixing it. So it's time to close this pr. ------------- PR: https://git.openjdk.java.net/jdk/pull/6525 From dholmes at openjdk.java.net Tue Nov 30 04:43:06 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 30 Nov 2021 04:43:06 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore wrote: > This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. > Testing tier1-6 in progress. Hi Coleen, This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :( Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6606 From stuefe at openjdk.java.net Tue Nov 30 06:07:02 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 30 Nov 2021 06:07:02 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore wrote: > This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. > Testing tier1-6 in progress. LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6606 From eosterlund at openjdk.java.net Tue Nov 30 06:07:02 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 30 Nov 2021 06:07:02 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore wrote: > This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. > Testing tier1-6 in progress. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6606 From stuefe at openjdk.java.net Tue Nov 30 06:07:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 30 Nov 2021 06:07:03 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 04:39:58 GMT, David Holmes wrote: > Hi Coleen, > > This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :( > > Thanks, David Does AsyncGetCallTrace get triggered asynchronously via signal? ------------- PR: https://git.openjdk.java.net/jdk/pull/6606 From dholmes at openjdk.java.net Tue Nov 30 06:24:02 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 30 Nov 2021 06:24:02 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 06:02:08 GMT, Thomas Stuefe wrote: > > Hi Coleen, > > This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :( > > Thanks, David > > Does AsyncGetCallTrace get triggered asynchronously via signal? Yes: ```V [libjvm.so+0x986023] AsyncGetCallTrace+0x1e5 C [libasyncProfiler.so+0x89b4] Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int)+0xd4 C [libasyncProfiler.so+0x9242] Profiler::recordSample(void*, unsigned long long, int, Event*)+0xd2 C [libasyncProfiler.so+0x34f2c] PerfEvents::signalHandler(int, siginfo_t*, void*)+0x8c ------------- PR: https://git.openjdk.java.net/jdk/pull/6606 From duke at openjdk.java.net Tue Nov 30 07:18:36 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 30 Nov 2021 07:18:36 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v3] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request incrementally with one additional commit since the last revision: add free and Reallocate event ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/d883f62d..f883847f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=01-02 Stats: 53 lines in 4 files changed: 49 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From stuefe at openjdk.java.net Tue Nov 30 07:21:06 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 30 Nov 2021 07:21:06 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore wrote: > This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. > Testing tier1-6 in progress. > > > Hi Coleen, > > > This bypasses the currently observed problem, but we still have a fundamentally unsafe mechanism in use here. :( > > > Thanks, David > > > > > > Does AsyncGetCallTrace get triggered asynchronously via signal? > > Yes: > > ```v > C [libasyncProfiler.so+0x89b4] Profiler::getJavaTraceAsync(void*, ASGCT_CallFrame*, int)+0xd4 > C [libasyncProfiler.so+0x9242] Profiler::recordSample(void*, unsigned long long, int, Event*)+0xd2 > C [libasyncProfiler.so+0x34f2c] PerfEvents::signalHandler(int, siginfo_t*, void*)+0x8c > ``` What you could do is keep (on demand only) a secondary resource area per thread. On entering a context that may have been called by a signal handler, and with the current resource area in an unknown state, swap the current resource area pointer in Thread with that prepared secondary resource area, and upon leaving swap back. That way you never touch the original resource area. Kind of like double buffering for signal contexts. ------------- PR: https://git.openjdk.java.net/jdk/pull/6606 From duke at openjdk.java.net Tue Nov 30 07:27:27 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 30 Nov 2021 07:27:27 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v4] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request incrementally with one additional commit since the last revision: remove whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/f883847f..790cc817 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From duke at openjdk.java.net Tue Nov 30 08:00:11 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 30 Nov 2021 08:00:11 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v4] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 07:27:27 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request incrementally with one additional commit since the last revision: > > remove whitespace Thanks I added 3 events |event|stack|addr|size| |-|-|-|-| |Allocation|true|alloc|true| |Reallocate|true|before realloc,after realloc|true| |Free|true|free addr|false| ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From aph at openjdk.java.net Tue Nov 30 10:24:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 30 Nov 2021 10:24:02 GMT Subject: RFR: 8277948: AArch64: Print the correct native stack if -XX:+PreserveFramePointer when crash In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 17:40:43 GMT, Denghui Dong wrote: > Hi, > > I found that the native stack frames in the hs log are not accurate sometimes on AArch64, not sure if this is a known issue or an issue worth fixing. > > The following steps can quick reproduce the problem: > > 1. apply the diff(comment the dtrace_object_alloc call in interpreter and make a crash on SharedRuntime::dtrace_object_alloc) > > index 39e99bdd5ed..4fc768e94aa 100644 > --- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp > @@ -3558,6 +3558,7 @@ void TemplateTable::_new() { > __ store_klass_gap(r0, zr); // zero klass gap for compressed oops > __ store_klass(r0, r4); // store klass last > > +/** > { > SkipIfEqual skip(_masm, &DTraceAllocProbes, false); > // Trigger dtrace event for fastpath > @@ -3567,6 +3568,7 @@ void TemplateTable::_new() { > __ pop(atos); // restore the return value > > } > +*/ > __ b(done); > } > > diff --git a/src/hotspot/cpu/x86/templateTable_x86.cpp b/src/hotspot/cpu/x86/templateTable_x86.cpp > index 19530b7c57c..15b0509da4c 100644 > --- a/src/hotspot/cpu/x86/templateTable_x86.cpp > +++ b/src/hotspot/cpu/x86/templateTable_x86.cpp > @@ -4033,6 +4033,7 @@ void TemplateTable::_new() { > Register tmp_store_klass = LP64_ONLY(rscratch1) NOT_LP64(noreg); > __ store_klass(rax, rcx, tmp_store_klass); // klass > > +/** > { > SkipIfEqual skip_if(_masm, &DTraceAllocProbes, 0); > // Trigger dtrace event for fastpath > @@ -4041,6 +4042,7 @@ void TemplateTable::_new() { > CAST_FROM_FN_PTR(address, static_cast(SharedRuntime::dtrace_object_alloc)), rax); > __ pop(atos); > } > +*/ > > __ jmp(done); > } > diff --git a/src/hotspot/share/runtime/sharedRuntime.cpp b/src/hotspot/share/runtime/sharedRuntime.cpp > index a5de65ea5ab..60b4bd3bcc8 100644 > --- a/src/hotspot/share/runtime/sharedRuntime.cpp > +++ b/src/hotspot/share/runtime/sharedRuntime.cpp > @@ -1002,6 +1002,7 @@ jlong SharedRuntime::get_java_tid(Thread* thread) { > * 6254741. Once that is fixed we can remove the dummy return value. > */ > int SharedRuntime::dtrace_object_alloc(oopDesc* o) { > + *(int*)0 = 1; > return dtrace_object_alloc(Thread::current(), o, o->size()); > } > > > 2. `java -XX:+DTraceAllocProbes -Xcomp -XX:-PreserveFramePointer -version` > > On x86_64, the native stack in hs log is complete, but in AArch64, the native stack is incorrect. > > In the beginning, I thought it might be the influence of PreserveFramePointer. Later, I found that no matter whether PreserveFramePointer is enabled or not, in the hs log of x86_64, the native stack is always correct, and aarch64 is wrong. > > After some investigation, I found that this problem is related to the layout of the stack. > > On x86_64, whether it is C/C++, interpreter, or JIT, `callee` will always put the `return address` and `fp` of the `caller` at the bottom of the stack. > Hence, `callee` can always get the `caller sp`(aka `sender sp`) by `fp + 2`, and if `caller` is a compiled method, `caller sp` is the key to getting the `caller`'s `caller` since `caller fp` may be invalid.(see frame::sender_for_compiled_frame). > > > push %rbp > mov %rsp,%rbp > > _ _ _ _ _ _ > | | > | | | > |_ _ _ _ _ _| | > | | | > caller | | <- caller sp | > _ _ _ |_ _ _ _ _ _| | expand > | | | > | ret addr | | direction > callee |_ _ _ _ _ _| | > | | V > | caller fp | <- fp > |_ _ _ _ _ _| > > > > But for AArch64, the C/C++ code doesn't put the `return address` and `fp` of the `caller` at the bottom of the stack. > Hence, we cannot use `fp + 2` to calculate the proper `caller sp`(although it is still implemented this way). > > When `caller` is a C1/C2 method A, and `callee` a C/C++ method B, we cannot get the `caller` of A since we cannot get the proper sp value of it. > > > stp x29, x30, [sp, #-N]! > mov x29, sp > > _ _ _ _ _ _ > | | > | | | > |_ _ _ _ _ _| | > | | | > caller | | <- caller sp | > _ _ _ |_ _ _ _ _ _| - | expand > | | > . . . . . | | direction > _ _ _ _ _ _ | | > | | | N | > | ret addr | | | > callee |_ _ _ _ _ _| | | > | | - V > | caller fp | <- fp > |_ _ _ _ _ _| > > > > I am not very familiar with AArch64 and have no idea how to fix this issue perfectly at current. > > Based on my understanding of the implementation, we can get the correct stack trace when PreserveFramePointer is enabled. > > Although PreserveFramePointer is disabled by default, I found that some real applications will enable it in the production environment. > Therefore, in my opinion, this fix can help troubleshoot crash issues in applications that enable PreserveFramePointer on AArch64 platform. > > This patch changes the logic of l_sender_sp calculation, uses sender_sp() as the value of l_sender_sp when PreserveFramePointer is enabled. > > Any input is appreciated. > > Thanks, > Denghui Thank you for this. I'll have a look. Stack unwinding on AArch64 C/C++ uses call frame information, which is in a separate section in the binary file. This allows the stack to be fully traced, even if there is no frame pointer. There is a library, libunwind, which does this. But that won't work with Java, which has its own way to do it It would be nice to get -XX:+PreserveFramePointer working correctly. ------------- PR: https://git.openjdk.java.net/jdk/pull/6597 From shade at openjdk.java.net Tue Nov 30 10:47:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 10:47:55 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5] In-Reply-To: References: Message-ID: > This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. > > Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. > > Additional testing: > - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass > - [x] Linux x86_64 Zero works with `async-profiler` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace - Fix a comment - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace - More reviews - Review feedback - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace - Initial work: runs async-profiler successfully ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5848/files - new: https://git.openjdk.java.net/jdk/pull/5848/files/bc4ba33b..373f15ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5848&range=03-04 Stats: 22783 lines in 424 files changed: 13220 ins; 6227 del; 3336 mod Patch: https://git.openjdk.java.net/jdk/pull/5848.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5848/head:pull/5848 PR: https://git.openjdk.java.net/jdk/pull/5848 From duke at openjdk.java.net Tue Nov 30 11:04:44 2021 From: duke at openjdk.java.net (xpbob) Date: Tue, 30 Nov 2021 11:04:44 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8277930 - remove whitespace - add free and Reallocate event - Merge branch 'openjdk:master' into JDK-8277930 - 8277930: Add unsafe allocation event to jfr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6591/files - new: https://git.openjdk.java.net/jdk/pull/6591/files/790cc817..b09c744d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6591&range=03-04 Stats: 163 lines in 10 files changed: 58 ins; 83 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/6591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6591/head:pull/6591 PR: https://git.openjdk.java.net/jdk/pull/6591 From aph at openjdk.java.net Tue Nov 30 11:29:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 30 Nov 2021 11:29:11 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 10:47:55 GMT, Aleksey Shipilev wrote: >> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. >> >> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. >> >> Additional testing: >> - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass >> - [x] Linux x86_64 Zero works with `async-profiler` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace > - Fix a comment > - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace > - More reviews > - Review feedback > - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace > - Initial work: runs async-profiler successfully src/hotspot/cpu/zero/frame_zero.cpp line 139: > 137: assert(is_interpreted_frame(), "Not an interpreted frame"); > 138: // These are reasonable sanity checks > 139: if (fp() == 0 || (intptr_t(fp()) & (wordSize-1)) != 0) { Use `is_aligned()` here? ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From egahlin at openjdk.java.net Tue Nov 30 11:37:13 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Tue, 30 Nov 2021 11:37:13 GMT Subject: RFR: 8277930: Add unsafe allocation event to jfr [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:04:44 GMT, xpbob wrote: >> Unsafe is used in many Java frameworks. >> When the framework has a unsafe memory leak , there is no way to know what code is causing it. >> Add unsafe allocation event to jfr. >> Records the size and stack allocated. >> This event is off by default > > xpbob has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8277930 > - remove whitespace > - add free and Reallocate event > - Merge branch 'openjdk:master' into JDK-8277930 > - 8277930: Add unsafe allocation event to jfr What about overhead (if JFR is disabled)? This looks like it could be a hot path for some applications. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From smonteith at openjdk.java.net Tue Nov 30 11:41:14 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Tue, 30 Nov 2021 11:41:14 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev wrote: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 9m11.037s > user 78m2.766s > sys 0m19.873s > > # x86_32 (TR 3970X) > real 13m39.054s > user 147m38.308s > sys 0m10.924s > > # x86_64 (i5-11500) > real 41m32.622s > user 447m19.986s > sys 0m21.026s > > # AArch64 (ThunderX2) > real 5m34.210s > user 45m16.015s > sys 0m24.723s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` This looks great, thanks Aleksey. This covers all of the cases I'd reasonably expect to see covered. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From duke at openjdk.java.net Tue Nov 30 12:43:35 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Tue, 30 Nov 2021 12:43:35 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v3] In-Reply-To: References: Message-ID: > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members Vishal Chand has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into JDK-8277372-refactor - Refactoring in hotspot/cpu dir - Initial patch ------------- Changes: https://git.openjdk.java.net/jdk/pull/6570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=02 Stats: 223 lines in 40 files changed: 46 ins; 11 del; 166 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From shade at openjdk.java.net Tue Nov 30 13:01:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 13:01:13 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 11:26:04 GMT, Andrew Haley wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Fix a comment >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - More reviews >> - Review feedback >> - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace >> - Initial work: runs async-profiler successfully > > src/hotspot/cpu/zero/frame_zero.cpp line 139: > >> 137: assert(is_interpreted_frame(), "Not an interpreted frame"); >> 138: // These are reasonable sanity checks >> 139: if (fp() == 0 || (intptr_t(fp()) & (wordSize-1)) != 0) { > > Use `is_aligned()` here? I could, but this matches what other platforms are doing in their `frame::is_interpreted_frame_valid()`. If there are no other fixes needed, okay if I keep this one in place? Otherwise, I would need to re-test the whole thing for a minor touchup, which is tedious. ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From duke at openjdk.java.net Tue Nov 30 13:39:41 2021 From: duke at openjdk.java.net (Vishal Chand) Date: Tue, 30 Nov 2021 13:39:41 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: References: Message-ID: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> > Changed the visibility, added getters and refactored the following: > > 1. Card Table Members > 2. BOT members > 3. ObjectStartArray block members Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: Rename BOTConstants ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6570/files - new: https://git.openjdk.java.net/jdk/pull/6570/files/69ee4a32..48828873 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6570&range=02-03 Stats: 93 lines in 9 files changed: 0 ins; 6 del; 87 mod Patch: https://git.openjdk.java.net/jdk/pull/6570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6570/head:pull/6570 PR: https://git.openjdk.java.net/jdk/pull/6570 From eric.caspole at oracle.com Tue Nov 30 16:00:00 2021 From: eric.caspole at oracle.com (eric.caspole at oracle.com) Date: Tue, 30 Nov 2021 11:00:00 -0500 Subject: RFR: 8277358: Accelerate CRC32-C In-Reply-To: References: Message-ID: Hi Scott, is there a JMH for this or would an existing zip JMH benefit from this change? If there is already one, great, otherwise could you add one? Thanks, Eric On 11/29/21 9:52 AM, Scott Gibbons wrote: > Accelerates CRC32-C by utilizing vpclmulqdq similarly to CRC32. This change achieves ~4x throughput improvement. > > 5986.947899319073 MB/s => 24041.05203089616 MB/s > 5840.02689336947 MB/s => 24898.781468710356 MB/s > > ********** Original *********** > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 1.710387358 seconds > CRC32C.update(byte[]) throughput = 5986.947899319073 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 1.753416583 seconds > CRC32C.update(ByteBuffer) throughput = 5840.02689336947 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > > > > > *********** With my changes: ************* > > > > scottgi at 96974-ICX32:~/crc/jdk (asgibbons-crc32c)$ java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java 20000000 > offset = 0 > msgSize = 512 bytes > iters = 20000000 > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(byte[]) runtime = 0.425938099 seconds > CRC32C.update(byte[]) throughput = 24041.05203089616 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > CRC32C.update(ByteBuffer) runtime = 0.411265106 seconds > CRC32C.update(ByteBuffer) throughput = 24898.781468710356 MB/s > CRCs: crc = ae10ee5a, crcReference = ae10ee5a > ------------------------------------------------------- > > ------------- > > Commit messages: > - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c > - Merge branch 'master' into asgibbons-crc32c > - Asgibbons crc32c (#7) > - Merge branch 'openjdk:master' into master > - Revert .gitignore change > - Move register save to within conditional; add comments > - Bad merge. > - Merge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c > - ZZMerge branch 'asgibbons-crc32c' of https://github.com/asgibbons/jdk into asgibbons-crc32c > - Use existing CRC32 code with different table for CRC32-C > - ... and 203 more: https://git.openjdk.java.net/jdk/compare/e9b36a83...10aeaec6 > > Changes: https://git.openjdk.java.net/jdk/pull/6595/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6595&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8277358 > Stats: 62 lines in 4 files changed: 40 ins; 1 del; 21 mod > Patch: https://git.openjdk.java.net/jdk/pull/6595.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/6595/head:pull/6595 > > PR: https://git.openjdk.java.net/jdk/pull/6595 From tschatzl at openjdk.java.net Tue Nov 30 16:16:10 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 30 Nov 2021 16:16:10 GMT Subject: RFR: 8277372: Add getters for BOT and card table members [v4] In-Reply-To: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> References: <3ux2lUBwsHGYTsBe0jE0nvKWWEljd9VH2IdLwp0utNw=.7bf0bebb-6aca-4132-aadd-1113e657a6da@github.com> Message-ID: On Tue, 30 Nov 2021 13:39:41 GMT, Vishal Chand wrote: >> Changed the visibility, added getters and refactored the following: >> >> 1. Card Table Members >> 2. BOT members >> 3. ObjectStartArray block members > > Vishal Chand has updated the pull request incrementally with one additional commit since the last revision: > > Rename BOTConstants Getting good :) Some minor comments. src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp line 290: > 288: assert(_bot->offset_array(j) > 0 && > 289: _bot->offset_array(j) <= > 290: (u_char) (BOTConstants::bot_card_size_words()+BOTConstants::N_powers-1), Suggestion: (u_char) (BOTConstants::bot_card_size_words() + BOTConstants::N_powers - 1), Pre-existing: operator has no spaces around it src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp line 295: > 293: (uint) _bot->offset_array(j), > 294: (uint) _bot->offset_array(j), > 295: (uint) (BOTConstants::bot_card_size_words()+BOTConstants::N_powers-1)); Suggestion: (uint) (BOTConstants::bot_card_size_words() + BOTConstants::N_powers - 1)); Pre-existing: spaces around operator src/hotspot/share/gc/parallel/objectStartArray.hpp line 52: > 50: static uint _block_size; > 51: static uint _block_size_in_words; > 52: Almost the same naming issue as in the `BlockOffsetTable/SharedArray`; I would prefer if these members (and getters) here were named similarly to the ones there. It is true that `ObjectStartArray` and `BlockOffsetTable` are basically the same thing, but any eventual merge is another issue. src/hotspot/share/gc/shared/cardTable.cpp line 416: > 414: dirty_cards++, next_entry++); > 415: MemRegion cur_cards(addr_for(cur_entry), > 416: dirty_cards*_card_size_in_words); Suggestion: dirty_cards * _card_size_in_words); Pre-existing: spaces around operator src/hotspot/share/gc/shared/cardTable.cpp line 442: > 440: dirty_cards++, next_entry++); > 441: MemRegion cur_cards(addr_for(cur_entry), > 442: dirty_cards*_card_size_in_words); Suggestion: dirty_cards * _card_size_in_words); Pre-existing: spaces around operator ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6570 From sviswanathan at openjdk.java.net Tue Nov 30 16:47:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 30 Nov 2021 16:47:04 GMT Subject: RFR: 8277617: Adjust AVX3Threshold for copy/fill stubs [v6] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 00:10:39 GMT, Sandhya Viswanathan wrote: >> Currently 32-byte instructions are used for small array copy and clear. >> This can be optimized by using 64-byte instructions. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace @neliasso Could you please also review this small patch. I would like to get it integrated before JDK 18 feature freeze. ------------- PR: https://git.openjdk.java.net/jdk/pull/6512 From kvn at openjdk.java.net Tue Nov 30 19:29:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 19:29:07 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 13:28:33 GMT, Aleksey Shipilev wrote: > I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable. > > A brief tour of these tests: > > - Tests all data types; > - Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc; > - Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases; > - Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to `-XX:UseAVX=0 -XX:UseSSE=0` cases; > - Tests with/without compressed oops mode -- theoretically only needed for `Object` copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types; > > My previous version used individual `@run` clauses for all configurations, but I think the Java driver is cleaner and easier to maintain. > > Test times: > > > # x86_64 (TR 3970X) > real 9m11.037s > user 78m2.766s > sys 0m19.873s > > # x86_32 (TR 3970X) > real 13m39.054s > user 147m38.308s > sys 0m10.924s > > # x86_64 (i5-11500) > real 41m32.622s > user 447m19.986s > sys 0m21.026s > > # AArch64 (ThunderX2) > real 5m34.210s > user 45m16.015s > sys 0m24.723s > > > Since these tests are quite long, especially on small machines, I hooked them up to `hotspot:tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `compiler/stress/arraycopy` > - [x] Linux x86_32 fastdebug `compiler/stress/arraycopy` > - [x] Linux AArch64 fastdebug `compiler/stress/arraycopy` I assume that `test/micro/org/openjdk/bench/java/lang` micros cover all these cases. Otherwise you may need to add some. test/hotspot/jtreg/TEST.groups line 183: > 181: > 182: tier3_compiler = \ > 183: compiler/arraycopy/stress Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here. I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines. test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32: > 30: * Max array size to test. > 31: */ > 32: static final int MAX_SIZE = 1024*1024 + 1; Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Nov 30 19:38:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 19:38:22 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} Message-ID: I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. Sample times for new subgroups (think about this as "How much time they add to existing tiers"): ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 ============================== real 2m16.518s user 35m40.839s sys 1m35.334s ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 ============================== real 4m31.935s user 71m54.617s sys 2m13.073s ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/6622/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278016 Stats: 43 lines in 1 file changed: 43 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6622.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6622/head:pull/6622 PR: https://git.openjdk.java.net/jdk/pull/6622 From shade at openjdk.java.net Tue Nov 30 20:29:05 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 20:29:05 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: Message-ID: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> On Tue, 30 Nov 2021 19:25:41 GMT, Vladimir Kozlov wrote: > I assume that `test/micro/org/openjdk/bench/java/lang` micros cover all these cases. Otherwise you may need to add some. Yes. Performance tests will come separately. This PR covers purely functional tests that verify arraycopies are not foobar-ing array contents, not hitting any asserts, or otherwise crash VMs. Performance tests would run on a limited set of inputs and in `release` bits, so they are bad for verification like this :) > test/hotspot/jtreg/TEST.groups line 183: > >> 181: >> 182: tier3_compiler = \ >> 183: compiler/arraycopy/stress > > Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here. > I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines. Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra. > test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32: > >> 30: * Max array size to test. >> 31: */ >> 32: static final int MAX_SIZE = 1024*1024 + 1; > > Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think. My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From kvn at openjdk.java.net Tue Nov 30 20:39:10 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 20:39:10 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: On Tue, 30 Nov 2021 20:21:19 GMT, Aleksey Shipilev wrote: >> test/hotspot/jtreg/compiler/arraycopy/stress/AbstractStressArrayCopy.java line 32: >> >>> 30: * Max array size to test. >>> 31: */ >>> 32: static final int MAX_SIZE = 1024*1024 + 1; >> >> Do we really need such big arrays for regression testing. It may make sense for JMH but not for these tests I think. > > My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree. Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this: # x86_64 (i5-11500) real 41m32.622s user 447m19.986s sys 0m21.026s Do you know why it takes so much time on it? ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From shade at openjdk.java.net Tue Nov 30 20:44:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 20:44:43 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: > I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. > > Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. > > We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. > > Sample times for new subgroups (think about this as "How much time they add to existing tiers"): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 > ============================== > > real 2m16.518s > user 35m40.839s > sys 1m35.334s > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 > ============================== > > real 4m31.935s > user 71m54.617s > sys 2m13.073s Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Filter out tier1/2 groups too ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6622/files - new: https://git.openjdk.java.net/jdk/pull/6622/files/d027cbe0..3a15f32b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6622&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6622.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6622/head:pull/6622 PR: https://git.openjdk.java.net/jdk/pull/6622 From kvn at openjdk.java.net Tue Nov 30 20:49:03 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 20:49:03 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: On Tue, 30 Nov 2021 20:23:04 GMT, Aleksey Shipilev wrote: >> test/hotspot/jtreg/TEST.groups line 183: >> >>> 181: >>> 182: tier3_compiler = \ >>> 183: compiler/arraycopy/stress >> >> Can you introduce separate group for this? For example `hotspot_arraycopy_stress` and use it here. >> I am fine with introduced `tier2|3_compiler` groups but it will help us in Oracle to have separate group for `arraycopy` so we can schedule its testing on proper machines. > > Yes, we can. Actually, working on #6622, I realized these test groups would be introduced anyway. So these new arraycopy tests should probably go to `hotspot_slow_compiler` group, along with other `stress` tests. This would hook arraycopy tests into `hotspot:tier3` automatically if #6622 lands. Tell me if you still want a completely separate test group, or `hotspot_slow_compiler` is enough for current Oracle testing infra. Please, create separate test group and add it to `hotspot_slow_compiler`. We would not need to change infra settings if more testing is added to this new group later. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From kvn at openjdk.java.net Tue Nov 30 21:01:08 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 30 Nov 2021 21:01:08 GMT Subject: RFR: 8278016: Add compiler tests to tier{2,3} [v2] In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 20:44:43 GMT, Aleksey Shipilev wrote: >> I have been looking at `hotspot:tier4` (catch-all not in lower tiers) run logs, and realized the whole bunch of compiler tests are running there. >> >> Since `hotspot:tier4` runs a lot of `vmTestbase` tests, contributors seldom run it, as it takes many hours. Which means that many compiler tests are not running regularly for many contributors. But these tests are rather fast themselves and cover important compiler features. >> >> We can properly add compiler tests to `tier{2,3}` to expose them on earlier tiers. The split logic between tiers is roughly: fast feature tests go into tier2, slower feature tests and debugging/printing stuff goes to tier3. >> >> Sample times for new subgroups (think about this as "How much time they add to existing tiers"): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2_compiler 243 243 0 0 >> ============================== >> >> real 2m16.518s >> user 35m40.839s >> sys 1m35.334s >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3_compiler 132 132 0 0 >> ============================== >> >> real 4m31.935s >> user 71m54.617s >> sys 2m13.073s > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Filter out tier1/2 groups too Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6622 From shade at openjdk.java.net Tue Nov 30 21:25:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 30 Nov 2021 21:25:07 GMT Subject: RFR: 8277893: Arraycopy stress tests In-Reply-To: References: <86VRDdE8F6Q0b4CNj2otyPX2z07QA1fBlV0TN0Vn1cs=.43f1fcde-4727-4925-a7fd-51afca9d30cf@github.com> Message-ID: <9L5CHY8n-6csbW9jfsnXt4pSqnabXH5R7dt2pZFDmdA=.e53d343e-f7fd-46b1-a8af-02dba3fad3ec@github.com> On Tue, 30 Nov 2021 20:34:46 GMT, Vladimir Kozlov wrote: >> My original intent was to make sure the tests cross all small page sizes (up to 64K) and maybe even some large page sizes (1M `long[]` is 8M, so 2*4M). The size of this array does not matter for test performance very much, since we only allocate two `MAX_SIZE`-d arrays per entire run. Driver even caps the heap size at `-Xmx256m` to block tests from using too much memory. So, I'd leave it at 1M, if you agree. > > Okay. I was concern because of times you show. I am fine with running tests upto 10-15 mins but not this: > > # x86_64 (i5-11500) > real 41m32.622s > user 447m19.986s > sys 0m21.026s > > > Do you know why it takes so much time on it? That small machine has very slow memory compared to other ones. The parallelism in stress tests (9 types, 2 forked VMs each) puts that machine on its knees. There is a blurb about that effect here: https://github.com/openjdk/jdk/pull/6594/files#diff-f72fee20a49daaf4e05002372e93f426407ecd429a227393e2ec79e821042c90R40-R47 -- I don't think it would matter much if we trim `MAX_SIZE`, but I'll try tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/6594 From sspitsyn at openjdk.java.net Tue Nov 30 23:23:24 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 30 Nov 2021 23:23:24 GMT Subject: RFR: 8265150: AsyncGetCallTrace crashes on ResourceMark In-Reply-To: References: Message-ID: On Tue, 30 Nov 2021 02:37:47 GMT, Coleen Phillimore wrote: > This change seems to keep the test case in the bug from crashing in the ResourceMark destructor. We have a ResourceMark during stack walking in AsyncGetCallTrace. Also RegisterMap during jvmti shouldn't process oops, fix care of @fisk. > Testing tier1-6 in progress. Hi Coleen, I'm okay with this work around. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6606 From sspitsyn at openjdk.java.net Tue Nov 30 23:26:32 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Tue, 30 Nov 2021 23:26:32 GMT Subject: RFR: 8274903: Zero: Support AsyncGetCallTrace [v5] In-Reply-To: References: Message-ID: <9Qsyq7smTvNNP3a7WwtYaMDiGesLCvPWF1FTxArevT0=.aa62eb0d-0fbd-4f5b-a15c-5553cf88ecff@github.com> On Tue, 30 Nov 2021 10:47:55 GMT, Aleksey Shipilev wrote: >> This is a Zero infrastructure improvement that makes Zero VM work with AsyncGetCallTrace, and by extension, async-profiler. >> >> Zero is quite odd in stack management. The "real" stack actually contains the C++ Interpreter and the rest of VM code. The Java stack is reported through the usual "frame" mechanism the rest of VM uses to get the mapping from Template Interpreter, stub, and compiled code. So, to support Java-centric AsyncGetCallTrace, we t "only" need Zero to report the proper Java frames from its ZeroStack from the profiling/signal handlers. >> >> Additional testing: >> - [x] Linux x86_64 Zero `serviceability/AsyncGetCallTrace` now pass >> - [x] Linux x86_64 Zero works with `async-profiler` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace > - Fix a comment > - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace > - More reviews > - Review feedback > - Merge branch 'master' into JDK-8274903-zero-asyncgetcalltrace > - Initial work: runs async-profiler successfully Marked as reviewed by sspitsyn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5848 From Divino.Cesar at microsoft.com Thu Nov 11 19:24:30 2021 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Thu, 11 Nov 2021 19:24:30 -0000 Subject: [External] : Re: RFC - Improving C2 Escape Analysis In-Reply-To: <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> References: <20210930140335.648146897@eggemoggin.niobe.net> <415a6622-a46c-33da-8e39-c8f3068c7df3@oracle.com> <44563450-403B-4A15-95AB-5FB5DCA4ED0B@oracle.com> <81f86a0b-dfb7-0b45-1779-49209a82ae40@oracle.com> <0f30507c-e0f0-c380-568b-ac441611e116@oracle.com> <787f8fbb-83e6-0867-1c97-ae2516df114b@oracle.com> <457a3277-bc96-d481-2a69-4559f25cd52e@oracle.com> Message-ID: Hi Vladimir, Thank you for the feedback and sorry for the delay in getting back to you! > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some > time investigating possible solutions for it but "no cigar". May be we do > indead need control flow analysis to resolve this. Can you elaborate a bit on the approaches you tried and why you didn't like them? By allocation merges do you mean nested objects like "obj1.obj2.x", right? Did you try solving both control-flow merge issues and also allocation merges? > There are 2 test files with small methods for different EA cases I used to > see how EA works: These examples are being very helpful, thank you again! > Yes, I think it would be good to have a prototype if you are comfortable to > work with C2 code already. I proposed small RFEs just for warmup ;) I talked with my colleagues and we decided to start the work by trying to fix the control/data-flow merge issues - *perhaps not for all cases, but at least for some of them*. Then, based on our experience with this and some benchmarking we'll decide if we really need flow-sensitive analysis and how to best approach that. We'll definitely take a look at the RFEs as we move along! Implementing Stadler algorithm was just something that crossed my mind initially, it's very likely the last approach we'd try ... I don't want to bite more than I can chew.. Regards, Cesar ________________________________ From: Vladimir Kozlov Sent: October 29, 2021 5:27 PM To: Cesar Soares Lucas ; Tobias Hartmann ; Ron Pressler Cc: John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net ; Brian Stafford ; Martijn Verburg ; Hohensee, Paul Subject: Re: [External] : Re: RFC - Improving C2 Escape Analysis On 10/29/21 4:50 PM, Cesar Soares Lucas wrote: > Hi Vladimir and Tobias, > > >> Sure, here are four examples of EA and/or scalarization failing due to > >> complicated control/data flow: > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hz4ti9lgmQeGLX%2BZ3vmSngXHHUAX%2FAvtObgeu%2Fqz1DI%3D&reserved=0 > > >> There are 2 test files with small methods for different EA cases I used to > >> see how EA works: > >> > >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java > >> test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java > > Thank you for the examples, Tobias/Vladimir. This is being very helpful. > > >> Yes, finding solution for allocation merges (or NULL) is a pain. I spent > >> some time investigating possible solutions for it but "no cigar". May be we > >> do indead need control flow analysis to resolve this. > > By "need control flow analysis" you mean the flow-sensitive EA algorithm? My Yes. To clarify. I investigated solutions in current flow-insensitive EA. > first idea to handle these control/data-merge issues was to implement in C2 the > same algorithm used by GRAAL - i.e., the algorithm described in Stadler et. al > PEA paper. Do you think this is reasonable? Yes, I think it would be good to have a prototype if you are comfortable to work with C2 code already. I proposed small RFEs just for warmup ;) > > >> I am currently looking on iterative EA. Do more EA rounds if we can > >> eliminate more connected allocations. It was proposed by Vladimir Ivanov and > >> I have working prototype. > > Cool! I'm curious, when do you plan to submit a Pull Request for this? I am investigating regressions in some benchmarks. > > >> There is also suggestion from Amazon Java group about "C2 Partial Escape > >> Analysis" which needs more discussion: > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S65Ko1Jss1PRksoLs9w1Ci7lH7Cprikus4goFUXDdL4%3D&reserved=0 > > I'd love to hear from them about their experience with these issues and if they > have any plans to work on this moving forward! I'll ping them on the thread > that you linked above. Yes, I would like them to participate too (CCing to Paul). They sent proposal almost 6 months ago and we did not hear any additional information after Vladimir Ivanov replied. Regards, Vladimir K > > > Regards, > Cesar > ------------------------------------------------------------------------------------------------------------------------ > *From:* Vladimir Kozlov > *Sent:* October 27, 2021 10:26 AM > *To:* Tobias Hartmann ; Cesar Soares Lucas ; Ron Pressler > > *Cc:* John Rose ; Mark Reinhold ; hotspot-dev at openjdk.java.net > ; Brian Stafford ; Martijn Verburg > > *Subject:* Re: [External] : Re: RFC - Improving C2 Escape Analysis > First. Thank you, Cesar, for collecting data about C2 EA shortcomings. > > I agree with cases Tobias pointed as possible starting points to improve EA. > > Yes, finding solution for allocation merges (or NULL) is a pain. I spent some time investigating possible solutions for > it but "no cigar". May be we do indead need control flow analysis to resolve this. > > I looked through JBS and found few issues which are not required to write new EA: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-7149991&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731032568%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w1OPBcpSVInagqRbMJ9%2BB0XYxxm84DWKGltPT5Btjss%3D&reserved=0 > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8059378&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=iFo%2Farh7mS777oQl705t5pznFZttfMGqFO6%2BQpr71uY%3D&reserved=0 > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8073358&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=wkSutLxq2%2B%2FqUsUViubbNO97gQQ9I91%2FarNQqQxIFC8%3D&reserved=0 > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8155769&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oCMhOgnX0FjV4j%2Bymy7z8Op6IFfd8z71AZ%2BZlqbYWSU%3D&reserved=0 > > > Tobias also has fix prototype for next bug which was not fixed yet: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8236493&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=KCLrH3%2FnNhLANzyGrbCLILwuDUfql5h3Lx0REVsol%2F0%3D&reserved=0 > > > Ther are 2 test files with small methods for different EA cases I used to see how EA works: > > test/hotspot/jtreg/compiler/escapeAnalysis/Test6726999.java > test/hotspot/jtreg/compiler/escapeAnalysis/Test6689060.java > > You can start looking on above RFE/bug or run these tests and see why scalarization failed for some cases. Except for > known merge issue: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-6853701&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731042513%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vYIhnXEGGw%2FLx83NKcCAu0Vdt382TngtfpQ%2BCDBq7cU%3D&reserved=0 > > > I am currently looking on iterative EA. Do more EA rounds if we can eliminate more connected allocations. It was > proposed by Vladimir Ivanov and I have working prototype. > > There is also suggestin from Amazon Java group about "C2 Partial Escape Analysis" which needs more discsussion: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openjdk.java.net%2Fpipermail%2Fhotspot-compiler-dev%2F2021-May%2F047486.html&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=VUDTiLcJzwQkcUHQzLk7vcOIjmqSKQt8glKSrTHRX6w%3D&reserved=0 > > > Thanks, > Vladimir K > > On 10/27/21 3:04 AM, Tobias Hartmann wrote: >> Hi Cesar, >> >> On 27.10.21 08:20, Cesar Soares Lucas wrote: >>> Right. I was suspecting this to be the most critical issue indeed. However, I >>> didn't know there was a case where "... the object does not escape on any paths >>> but control flow is too complicated for EA to prove that." Is this an issue >>> tracked in JBS or perhaps you can show me an example where this happens? >> >> Sure, here are four examples of EA and/or scalarization failing due to complicated control/data >> flow: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcr.openjdk.java.net%2F~thartmann%2FEA_examples&data=04%7C01%7CDivino.Cesar%40microsoft.com%7C027f2da3f2e14f914c2608d99b3c195e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637711504731052481%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv5lNO928mVyKHHlZIKyQ2eZGfu4W9ADV%2BlyX2IAvlk%3D&reserved=0 > >> >> All examples would completely fold with inline types (Valhalla). >> >> I'm not sure if these issues are tracked by JBS issues but there's most likely an overlap with some >> of the issues you already described. >> >> Best regards, >> Tobias >>