From weijun at openjdk.org Sat Oct 1 15:02:04 2022 From: weijun at openjdk.org (Weijun Wang) Date: Sat, 1 Oct 2022 15:02:04 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v4] In-Reply-To: References: Message-ID: On Fri, 30 Sep 2022 17:38:54 GMT, Phil Race wrote: >> Why do we need to link to a URL? Why not `../../bridge/AccessBridgeCalls.c`? > > This is correct. > AccessBridge.h is published with the include/header files of the JDK and anyone reading it there can't exactly make use of "../" Thanks @prrace. And yes, git link is better. ------------- PR: https://git.openjdk.org/jdk/pull/10501 From jwaters at openjdk.org Sun Oct 2 10:16:45 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Oct 2022 10:16:45 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v29] In-Reply-To: References: Message-ID: > EDIT: Cave and add the ErrorOrigin enum, to differentiate which error type the error reporting functions in libjava will look up. RUNTIME refers to errors passed through the runtime via errno, and SYSTEM is for native errors not visible to the runtime. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Finish ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/a4fa093e..6b43fc60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=27-28 Stats: 331 lines in 63 files changed: 59 ins; 11 del; 261 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Sun Oct 2 10:19:18 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Oct 2022 10:19:18 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v30] In-Reply-To: References: Message-ID: > EDIT: Cave and add the ErrorOrigin enum, to differentiate which error type the error reporting functions in libjava will look up. RUNTIME refers to errors passed through the runtime via errno, and SYSTEM is for native errors not visible to the runtime. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into rework - Cleanup - Finish - Merge remote-tracking branch 'upstream/master' into rework - Cleanup - Cleanup - JNU_ThrowByNameWithMessageAndLastError - Progress - Remove getErrorString - Merge remote-tracking branch 'upstream/master' into rework - ... and 5 more: https://git.openjdk.org/jdk/compare/058336df...962b90ca ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/6b43fc60..962b90ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=28-29 Stats: 638 lines in 24 files changed: 418 ins; 159 del; 61 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Sun Oct 2 10:44:07 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Oct 2022 10:44:07 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v31] In-Reply-To: References: Message-ID: <6KLo6QWoXVZsz6btw25Ku75XDW3i947X0iFPkW8PS38=.6c484190-83d2-4d7b-b1ae-c375411b5218@github.com> > WIP Julian Waters has updated the pull request incrementally with one additional commit since the last revision: NET_ThrowByNameWithLastError ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/962b90ca..67169553 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=29-30 Stats: 40 lines in 17 files changed: 0 ins; 0 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Sun Oct 2 13:54:12 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Oct 2022 13:54:12 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v32] In-Reply-To: References: Message-ID: <7YJiFhgFKVyWN7rAVBiUHKvRuaN-lVLZXBMZK-22WSs=.1db0cb37-5149-4ca5-ad51-647fff5926e1@github.com> > WIP Julian Waters has updated the pull request incrementally with two additional commits since the last revision: - Aesthetic - Fix JLI_Perror and JLI_Snprintf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/67169553..a14c55c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=30-31 Stats: 53 lines in 2 files changed: 18 ins; 34 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Sun Oct 2 18:44:54 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Oct 2022 18:44:54 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v33] In-Reply-To: References: Message-ID: > WIP Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Missing include on Windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/a14c55c8..a92df1d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=31-32 Stats: 8 lines in 3 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Sun Oct 2 20:44:04 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 2 Oct 2022 20:44:04 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v34] In-Reply-To: References: Message-ID: > WIP Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Cast to void instead of void pointer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/a92df1d6..b89fb8b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=32-33 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From svkamath at openjdk.org Mon Oct 3 05:45:55 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 3 Oct 2022 05:45:55 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v12] In-Reply-To: References: Message-ID: > 8289552: Make intrinsic conversions between bit representations of half precision values and floats Smita Kamath has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Updated instruction definition - Merge branch 'master' - Addressed review comment to update test case - Addressed review comments - Merge branch 'master' of https://git.openjdk.java.net/jdk into JDK-8289552 - Addressed review comments - Added missing parantheses - Addressed review comments, updated microbenchmark - Updated copyright comment - Updated test cases as per review comments - ... and 3 more: https://git.openjdk.org/jdk/compare/ac2b491b...69999ce4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9781/files - new: https://git.openjdk.org/jdk/pull/9781/files/8ccc0657..69999ce4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9781&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9781&range=10-11 Stats: 14672 lines in 427 files changed: 7255 ins; 5491 del; 1926 mod Patch: https://git.openjdk.org/jdk/pull/9781.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9781/head:pull/9781 PR: https://git.openjdk.org/jdk/pull/9781 From qamai at openjdk.org Mon Oct 3 08:36:30 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 3 Oct 2022 08:36:30 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11] In-Reply-To: References: <_Ghl2lsnrBhiWvVD3TMiwGo6SfQLl6idczb1QVqLa_I=.7cfa48e2-2987-43e0-a689-0e3462e4d270@github.com> Message-ID: On Fri, 30 Sep 2022 10:04:34 GMT, Quan Anh Mai wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comment to update test case > > src/hotspot/cpu/x86/x86.ad line 3674: > >> 3672: %} >> 3673: >> 3674: instruct convF2HF_mem_reg(memory mem, regF src, kReg ktmp, rRegI rtmp) %{ > > You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks. Rethink about it, you can get 0x01 by right shifting k0 to the right - `kshiftrw(ktmp, k0, 15)` ------------- PR: https://git.openjdk.org/jdk/pull/9781 From jsjolen at openjdk.org Mon Oct 3 09:26:30 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Oct 2022 09:26:30 GMT Subject: RFR: 8293691: converting a defined BasicType value to a string should not crash the VM [v2] In-Reply-To: <94ESvfZVeiBH-t36grd2jji9WdvcbbEtRJWyYgt96js=.d880d03a-593c-4584-9fd4-1ab5afce9aba@github.com> References: <94ESvfZVeiBH-t36grd2jji9WdvcbbEtRJWyYgt96js=.d880d03a-593c-4584-9fd4-1ab5afce9aba@github.com> Message-ID: On Wed, 28 Sep 2022 10:45:00 GMT, Johan Sj?len wrote: >> Hi, >> >> This implements Kim's suggested fix for this ticket. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Use ARRAY_SIZE instead Anyone want to sponsor this one :-)? ------------- PR: https://git.openjdk.org/jdk/pull/10447 From coleenp at openjdk.org Mon Oct 3 12:19:23 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Oct 2022 12:19:23 GMT Subject: RFR: 8293691: converting a defined BasicType value to a string should not crash the VM [v2] In-Reply-To: <94ESvfZVeiBH-t36grd2jji9WdvcbbEtRJWyYgt96js=.d880d03a-593c-4584-9fd4-1ab5afce9aba@github.com> References: <94ESvfZVeiBH-t36grd2jji9WdvcbbEtRJWyYgt96js=.d880d03a-593c-4584-9fd4-1ab5afce9aba@github.com> Message-ID: On Wed, 28 Sep 2022 10:45:00 GMT, Johan Sj?len wrote: >> Hi, >> >> This implements Kim's suggested fix for this ticket. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Use ARRAY_SIZE instead I do! ------------- PR: https://git.openjdk.org/jdk/pull/10447 From jsjolen at openjdk.org Mon Oct 3 12:20:51 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Oct 2022 12:20:51 GMT Subject: Integrated: 8293691: converting a defined BasicType value to a string should not crash the VM In-Reply-To: References: Message-ID: On Tue, 27 Sep 2022 11:21:40 GMT, Johan Sj?len wrote: > Hi, > > This implements Kim's suggested fix for this ticket. This pull request has now been integrated. Changeset: f2a32d99 Author: Johan Sj?len Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/f2a32d996ae09620474771c46a649f6c4e1148ad Stats: 15 lines in 2 files changed: 11 ins; 3 del; 1 mod 8293691: converting a defined BasicType value to a string should not crash the VM Reviewed-by: shade, coleenp, dlong ------------- PR: https://git.openjdk.org/jdk/pull/10447 From jsjolen at openjdk.org Mon Oct 3 13:02:51 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Oct 2022 13:02:51 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v4] In-Reply-To: References: Message-ID: > Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. > > This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. > > Thank you for considering it. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10412/files - new: https://git.openjdk.org/jdk/pull/10412/files/36f23e5c..eb45b770 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=02-03 Stats: 107 lines in 3 files changed: 88 ins; 3 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10412.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10412/head:pull/10412 PR: https://git.openjdk.org/jdk/pull/10412 From rehn at openjdk.org Mon Oct 3 13:08:41 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 3 Oct 2022 13:08:41 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* Message-ID: Hi, please consider. Yes it should be JavaThread*. But some additional places needed to use JavaThread* Passes t1-3. ------------- Commit messages: - JavaThread Changes: https://git.openjdk.org/jdk/pull/10532/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10532&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8289004 Stats: 26 lines in 5 files changed: 3 ins; 3 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/10532.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10532/head:pull/10532 PR: https://git.openjdk.org/jdk/pull/10532 From jsjolen at openjdk.org Mon Oct 3 13:37:25 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Oct 2022 13:37:25 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v5] In-Reply-To: References: Message-ID: > Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. > > This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. > > Thank you for considering it. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10412/files - new: https://git.openjdk.org/jdk/pull/10412/files/eb45b770..7076784c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10412.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10412/head:pull/10412 PR: https://git.openjdk.org/jdk/pull/10412 From jsjolen at openjdk.org Mon Oct 3 13:37:27 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 3 Oct 2022 13:37:27 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v4] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 13:02:51 GMT, Johan Sj?len wrote: >> Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. >> >> This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. >> >> Thank you for considering it. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring Scratch the comment regarding memory leaks. C heap allocated ResourceObjs must be manually deleted anyway. ------------- PR: https://git.openjdk.org/jdk/pull/10412 From jwaters at openjdk.org Mon Oct 3 13:37:38 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 3 Oct 2022 13:37:38 GMT Subject: RFR: 8292016: Split Windows API error handling from errors passed through the runtime in the JDK [v35] In-Reply-To: References: Message-ID: > WIP Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9870/files - new: https://git.openjdk.org/jdk/pull/9870/files/b89fb8b8..1be7e721 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9870&range=33-34 Stats: 63 lines in 13 files changed: 5 ins; 0 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/9870.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9870/head:pull/9870 PR: https://git.openjdk.org/jdk/pull/9870 From ngasson at openjdk.org Mon Oct 3 14:33:30 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 3 Oct 2022 14:33:30 GMT Subject: RFR: 8294261: AArch64: Use pReg instead of pRegGov when possible In-Reply-To: References: Message-ID: On Wed, 28 Sep 2022 05:52:40 GMT, Ningsheng Jian wrote: > Currently we allocate SVE predicate register p0-p6 for pRegGov operand, which are used as governing predicates for load/store and arithmetic, and also define pReg operand for all allocatable predicate registers. Since some SVE instructions are fine to use/define p8-p15, e.g. predicate operations, this patch makes the matcher work for mixed use of pRegGov and pReg, and tries to match pReg when possible. If a predicate reg is defined as pReg but used as pRegGov, register allocator will handle that properly. > > With p8-p15 being used as non-temp register, we need to save them as well when saving all registers. The code of setting predicate reg slot in OopMap in RegisterSaver::save_live_registers() is also removed, because on safepoint, vector masks have been transformed to vector [1]. > > Tested on different SVE systems. Also tested with making RA to allocate p8-p15 first for vReg operand, so that a p8-p15 reg has more chance to be allocated, and if an SVE instruction, emitted by ad rule, does not accept p8-p5, assembler will crash. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L265 Looks OK to me! ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.org/jdk/pull/10461 From duke at openjdk.org Mon Oct 3 15:00:33 2022 From: duke at openjdk.org (duke) Date: Mon, 3 Oct 2022 15:00:33 GMT Subject: Withdrawn: 8292006: Move thread accessor classes to threadJavaClasses.hpp In-Reply-To: References: Message-ID: On Sat, 6 Aug 2022 00:11:08 GMT, Ioi Lam wrote: > To improve modularity and build time, move the declaration of the following accessor from classfile/javaClasses.hpp to runtime/threadJavaClasses.hpp: > > + java_lang_Thread_FieldHolder > + java_lang_Thread_Constants > + java_lang_ThreadGroup > + java_lang_VirtualThread > > Also move javaThreadStatus.hpp from share/classfile to share/runtime, where it belongs. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9788 From dsamersoff at openjdk.org Mon Oct 3 15:11:38 2022 From: dsamersoff at openjdk.org (Dmitry Samersoff) Date: Mon, 3 Oct 2022 15:11:38 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: <_tGORytJS0OmSMKoP6TswalJYnLASUA5F8zEmWmmRvk=.b01c1ac2-3fee-4671-bdb3-649a605faf26@github.com> On Mon, 3 Oct 2022 13:37:38 GMT, Julian Waters wrote: >> A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Naming src/java.base/share/native/libjli/jli_util.h line 93: > 91: > 92: /* Support for using perror with printf arguments */ > 93: #define JLI_Perror(...) \ Is it better to convert this to a function? ------------- PR: https://git.openjdk.org/jdk/pull/9870 From dsamersoff at openjdk.org Mon Oct 3 15:17:19 2022 From: dsamersoff at openjdk.org (Dmitry Samersoff) Date: Mon, 3 Oct 2022 15:17:19 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 13:37:38 GMT, Julian Waters wrote: >> A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Naming src/java.base/share/native/libzip/zip_util.c line 871: > 869: > 870: if (zfd == -1) { > 871: #ifdef _WIN32 1. Could we use strerror_s here? 2. It's better to encapsulate similar code to a separate function or macro, rather than add #ifdef _WIN32 in the middle of the code. ------------- PR: https://git.openjdk.org/jdk/pull/9870 From dsamersoff at openjdk.org Mon Oct 3 15:39:21 2022 From: dsamersoff at openjdk.org (Dmitry Samersoff) Date: Mon, 3 Oct 2022 15:39:21 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: <_x-Zql7uyiZQzveqoRLS--fQEJB_Qr5ChyQjUtnmoGY=.28982dc5-4831-439e-bf23-85e14dacf978@github.com> On Mon, 3 Oct 2022 13:37:38 GMT, Julian Waters wrote: >> A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Naming Please, move code like one below to a separate function rather than spreading #ifdef _WIN32 all over the code. #ifdef _WIN32 /* The implementation on Windows uses the Windows API */ char buf[256]; size_t n = getLastWinErrorString(buf, sizeof(buf)); if (n > 0) { #else char* buf = NULL; const int error = errno; if (error != 0) buf = strerror(error); if (buf != NULL) { #endif ------------- Changes requested by dsamersoff (Reviewer). PR: https://git.openjdk.org/jdk/pull/9870 From darcy at openjdk.org Mon Oct 3 17:20:53 2022 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 3 Oct 2022 17:20:53 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v4] In-Reply-To: References: Message-ID: On Fri, 30 Sep 2022 20:25:28 GMT, Joe Darcy wrote: > Also, FWIW, there are 100+ hits in `test` as well. But that is so many it might warrant a separate PR..? Filed a few follow-up bugs: JDK-8294724: Update openjdk.java.net => openjdk.org in tests (umbrella) JDK-8294725: Update openjdk.java.net => openjdk.org in java command man page ------------- PR: https://git.openjdk.org/jdk/pull/10501 From darcy at openjdk.org Mon Oct 3 17:25:30 2022 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 3 Oct 2022 17:25:30 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v6] In-Reply-To: References: Message-ID: <_UYUckHSBcJizYd7JBVbr6evdOrHu9h2MopGUlzrLR8=.b5c6d747-800b-462b-a19a-73a0f5096e3e@github.com> > With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. > > Updates were made using a shell script. I"ll run a copyright updater before any push. Joe Darcy has updated the pull request incrementally with one additional commit since the last revision: Update doc directory files. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10501/files - new: https://git.openjdk.org/jdk/pull/10501/files/fbaf3d4c..6bf7bf61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=04-05 Stats: 38 lines in 4 files changed: 0 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/10501.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10501/head:pull/10501 PR: https://git.openjdk.org/jdk/pull/10501 From darcy at openjdk.org Mon Oct 3 17:29:45 2022 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 3 Oct 2022 17:29:45 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v7] In-Reply-To: References: Message-ID: <-_gZsyDFTlHjj-7UiLaIjMBikGCJU8M2Kz9D7dm-20I=.7eacd8cd-6f3c-4a9d-9bbb-18291146b58e@github.com> > With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. > > Updates were made using a shell script. I"ll run a copyright updater before any push. Joe Darcy has updated the pull request incrementally with one additional commit since the last revision: Update make directory. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10501/files - new: https://git.openjdk.org/jdk/pull/10501/files/6bf7bf61..224ed7a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=05-06 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10501.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10501/head:pull/10501 PR: https://git.openjdk.org/jdk/pull/10501 From darcy at openjdk.org Mon Oct 3 17:38:14 2022 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 3 Oct 2022 17:38:14 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v4] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 17:17:39 GMT, Joe Darcy wrote: > > Also, FWIW, there are 100+ hits in `test` as well. But that is so many it might warrant a separate PR..? > > Filed a few follow-up bugs: > > JDK-8294724: Update openjdk.java.net => openjdk.org in tests (umbrella) JDK-8294725: Update openjdk.java.net => openjdk.org in java command man page And also filed JDK-8294728: Update openjdk.java.net => openjdk.org in hotspot unit test docs ------------- PR: https://git.openjdk.org/jdk/pull/10501 From coleenp at openjdk.org Mon Oct 3 17:43:24 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Oct 2022 17:43:24 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v5] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 13:37:25 GMT, Johan Sj?len wrote: >> Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. >> >> This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. >> >> Thank you for considering it. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Add comment src/hotspot/share/memory/allocation.hpp line 183: > 181: } > 182: > 183: static ALWAYSINLINE void* operator new(size_t size, Were you going to move MEMFLAGS to the second argument in the new operators also? src/hotspot/share/memory/allocation.hpp line 264: > 262: > 263: // Dynamically pick the memory flags at allocation > 264: class CHeapObjDynamic { This can be simply using CHeapObjDynamic = CHeapObjImpl; so not to repeat all the various new operator declarations. src/hotspot/share/utilities/ostream.hpp line 45: > 43: // This allows for redirection via -XX:+DisplayVMOutputToStdout and > 44: // -XX:+DisplayVMOutputToStderr > 45: class outputStream : public CHeapObjDyn { This should be CHeapObjDynamic. ------------- PR: https://git.openjdk.org/jdk/pull/10412 From coleenp at openjdk.org Mon Oct 3 17:43:26 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 3 Oct 2022 17:43:26 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v3] In-Reply-To: References: Message-ID: On Thu, 29 Sep 2022 08:50:11 GMT, Johan Sj?len wrote: >> src/hotspot/share/opto/compile.hpp line 1064: >> >>> 1062: delete _print_inlining_stream; >>> 1063: }; >>> 1064: >> >> compile.cpp has print_inlining_stream_free() calls which will leak the stringStream now if called. I think this function needs to be removed and it should call the reset function to reinitialize the stream. >> There should be compiler tests that will fail if print_inlining_stream_free() is called with a null _print_inlining_stream pointer (I think the delete should fail (?) with null) > > This is removed as part of https://github.com/openjdk/jdk/pull/10396 . The function used to be here but I merged with upstream when that went in. delete on null is OK, just like with free. Ok, I see it removed now. ------------- PR: https://git.openjdk.org/jdk/pull/10412 From svkamath at openjdk.org Mon Oct 3 17:49:14 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 3 Oct 2022 17:49:14 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11] In-Reply-To: <5pqC4k2fyhaYIa9d6D3Dciv2ohYR-JCPvYW7lZsbXhw=.4a3071d6-39b8-4828-86a4-9c3871401844@github.com> References: <_Ghl2lsnrBhiWvVD3TMiwGo6SfQLl6idczb1QVqLa_I=.7cfa48e2-2987-43e0-a689-0e3462e4d270@github.com> <5pqC4k2fyhaYIa9d6D3Dciv2ohYR-JCPvYW7lZsbXhw=.4a3071d6-39b8-4828-86a4-9c3871401844@github.com> Message-ID: On Fri, 30 Sep 2022 09:59:02 GMT, Bhavana Kilambi wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comment to update test case > > Hi, would you be adding IR tests to verify the generation of the the newly introduced IR nodes? @Bhavana-Kilambi, I plan to do this in a separate PR along with the gtest. Here's the bug https://bugs.openjdk.org/browse/JDK-8293323. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From svkamath at openjdk.org Mon Oct 3 17:49:17 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 3 Oct 2022 17:49:17 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11] In-Reply-To: References: <_Ghl2lsnrBhiWvVD3TMiwGo6SfQLl6idczb1QVqLa_I=.7cfa48e2-2987-43e0-a689-0e3462e4d270@github.com> Message-ID: On Mon, 3 Oct 2022 08:34:06 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/x86.ad line 3674: >> >>> 3672: %} >>> 3673: >>> 3674: instruct convF2HF_mem_reg(memory mem, regF src, kReg ktmp, rRegI rtmp) %{ >> >> You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks. > > Rethink about it, you can get 0x01 by right shifting k0 to the right - `kshiftrw(ktmp, k0, 15)` @merykitty Thanks for the suggestion. I will update the instruct to use kmovwl. I will also experiment with kshiftrw and let you know. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From kim.barrett at oracle.com Mon Oct 3 19:14:13 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 3 Oct 2022 19:14:13 +0000 Subject: RFC: linux-aarch64 and LSE support In-Reply-To: <4a767b25-1539-183a-e4cd-4852f633e77d@littlepinkcloud.com> References: <7BD2A887-C204-4A36-8F2E-FA7386C17E2D@oracle.com> <28764c2d-6963-98ca-1212-b296d649e513@littlepinkcloud.com> <268A90AB-A4F1-44CB-A027-C07A12F9209B@oracle.com> <416c7338-2dff-7997-c95f-c9e1c74de180@littlepinkcloud.com> <0958C022-824B-4492-9D82-393F64965D5E@oracle.com> <246e397f-2f10-84fa-19d7-65b1472606e9@littlepinkcloud.com> <4a767b25-1539-183a-e4cd-4852f633e77d@littlepinkcloud.com> Message-ID: > On Sep 20, 2022, at 7:15 AM, Andrew Haley wrote: > > [?] > > To summarize, for memory_order_conservative with ll/sc-style atomics > > - For cmpxchg, use > > I think you found a bug in the pre-LSE code. > > aarch64_atomic_cmpxchg_8_default_impl: > #ifdef __ARM_FEATURE_ATOMICS > mov x3, x1 > casal x3, x2, [x0] > #else > dmb ish > prfm pstl1strm, [x0] > 0: ldxr x3, [x0] > cmp x3, x1 > b.ne 1f > stxr w8, x2, [x0] > cbnz w8, 0b > #endif > 1: dmb ish > mov x0, x3 > ret > > This should be LDAXR;STLXR. CASAL is correct. I don't think it needs to be LDAXR;STLXR. And neither does https://patchwork.kernel.org/patch/3575821/ The surrounding DMB-ISH pair provide the needed ordering. The "traditional" implementation for all of these operations on ll/sc has been a relaxed ll/sc loop preceeded and followed by fences. > > [?] > > (A) The default (ll/sc) implementations (atomic_linux_aarch64.S) are all > > "acq-rel" rather than "release". > > Good. Not good if one believes the afore-mentioned kernel.org analysis, which is what we relied on until 8261027 for linux, and are still relying on for bsd. That analysis says "release" is sufficient, and "acq-rel" is just an unnecessary "acquire". I didn't find any justification in the bug or PR discussion for that change. > > Meanwhile, the generated stubs for "xchg" variants are "release". > > Really? I see > > void gen_swpal_entry(Assembler::operand_size size) { > Register prev = r2, addr = c_rarg0, incr = c_rarg1; > __ swpal(size, incr, prev, addr); > __ membar(Assembler::StoreStore|Assembler::StoreLoad); > > That's acquire/release. What are you looking at? My comment was about that PR, not about later changes that are now on mainline. That is, I was referring to this: https://github.com/openjdk/jdk/pull/2434/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R5607-R5621 from the 8261027 PR. Note the `__ atomic_xchgl`. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From kim.barrett at oracle.com Mon Oct 3 19:16:31 2022 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 3 Oct 2022 19:16:31 +0000 Subject: RFC: linux-aarch64 and LSE support In-Reply-To: <4a767b25-1539-183a-e4cd-4852f633e77d@littlepinkcloud.com> References: <7BD2A887-C204-4A36-8F2E-FA7386C17E2D@oracle.com> <28764c2d-6963-98ca-1212-b296d649e513@littlepinkcloud.com> <268A90AB-A4F1-44CB-A027-C07A12F9209B@oracle.com> <416c7338-2dff-7997-c95f-c9e1c74de180@littlepinkcloud.com> <0958C022-824B-4492-9D82-393F64965D5E@oracle.com> <246e397f-2f10-84fa-19d7-65b1472606e9@littlepinkcloud.com> <4a767b25-1539-183a-e4cd-4852f633e77d@littlepinkcloud.com> Message-ID: <9E71444D-98A2-4179-A618-54027E6BC73B@oracle.com> > On Sep 20, 2022, at 7:15 AM, Andrew Haley wrote: > > On 9/20/22 11:32, Kim Barrett wrote: > > There is a big comment in front of the new stub generation code, talking about > > how a acq-rel operation doesn't need a preceeding fence when using LSE > > atomics. I can see how that's very useful for cmpxchg. (And the comment > > mostly discusses cmpxchg.) But I'm not certain of it's relevance for other > > operations. > > It's the same for all ops. None of them need a preceding fence. My question is not whether using an acq-rel operation is sufficient to remove the need for a preceding fence, it's whether that's *necessary*? Specifically, is a release operation also sufficient, as discussed in the kernel patch? https://patchwork.kernel.org/patch/3575821/ That kernel patch argues that for ll/sc atomics only a release operation is needed (except for cmpxchg). And that still seems to hold for the linux kernel - it uses ldxr/stlxr with a trailing dmb ish. Are you claiming otherwise? (For some reason 8261027 used ldaxr rather than ldxr in the new .S file, without any justification for that change.) We also need to understand what forms can/should be used for LSE. (Maybe a release operation with trailing fence works there too? But let's not go there.) I had much the same reaction as Dean about this, which your reply to him seems to agree with. Specifically, we only need an acq-rel LSE to get the behavior we want from LSE atomics. So I think that for memory_order_conservative we want to use: 1. For ll/sc cmpxchg: ldxr/stxr with leading and trailing dmb ish, e.g. a Relaxed operation with both leading and trailing fences. 2. For ll/sc non-cmpxchg: ldxr/stlxr with trailing dmb ish, e.g. a Release operation with a trailing fence. 3. For LSE: an acq-rel instruction for all operations. That looks like what the linux kernel is using. It also mostly agrees with a recent change to gcc outline-atomics support: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bc25483c055d62f94f8c289f80843dda3c4a6ff4 2022-05-13 Sebastian Pop PR target/105162 (included in gcc12.2) Unfortunately, that change doesn't implement ll/sc cmpxchg the way we want (item 1), instead using ldxr/stlxr+dmb. That's wrong, according to the afore-referenced linux kernel patch (an analysis I agree with). (There is also the problem of the __sync suite not having something for Atomic::xchg.) So how do we get the code we want? There doesn't seem to be a way for us to use gcc intrinsics. The "legacy" __sync_xxx operations are documented as "full barriers" as we want, and the above referenced change comes pretty close. But we couldn't use that right now even if that change perfectly matched what we want, as it is far too new. (After digging into some of this I have a great deal of sympathy for ErikO's position here: https://mail.openjdk.org/pipermail/hotspot-dev/2019-November/039931.html) So I think we are (at least for now) stuck with rolling our own. What we have has some problems, and I think can be improved in various ways (for example, I think it is possible to dispense with runtime stub generation). I'm planning to offer some PRs along those lines. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From prr at openjdk.org Mon Oct 3 20:08:19 2022 From: prr at openjdk.org (Phil Race) Date: Mon, 3 Oct 2022 20:08:19 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v7] In-Reply-To: <-_gZsyDFTlHjj-7UiLaIjMBikGCJU8M2Kz9D7dm-20I=.7eacd8cd-6f3c-4a9d-9bbb-18291146b58e@github.com> References: <-_gZsyDFTlHjj-7UiLaIjMBikGCJU8M2Kz9D7dm-20I=.7eacd8cd-6f3c-4a9d-9bbb-18291146b58e@github.com> Message-ID: On Mon, 3 Oct 2022 17:29:45 GMT, Joe Darcy wrote: >> With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. >> >> Updates were made using a shell script. I"ll run a copyright updater before any push. > > Joe Darcy has updated the pull request incrementally with one additional commit since the last revision: > > Update make directory. src/jdk.accessibility/windows/native/include/bridge/AccessBridgeCalls.h line 36: > 34: * https://git.openjdk.org/jdk/blob/master/src/jdk.accessibility/windows/native/bridge/AccessBridgeCalls.c > 35: * > 36: * Also note that the API is used in the jaccessinspector and jaccesswalker tools. The problem with this is, is that anyone who gets JDK 20 (or 21 the LTS) will be forever more then pointed at the OpenJDK "tip" and if we made an incompatible ABI change, that would be a problem. At this point I'd prefer that this be updated to point to JDK 17, as in https://github.com/openjdk/jdk17/blob/master/src/jdk.accessibility/windows/native/jaccesswalker/jaccesswalker.cpp So it is a defined, known, compatible version. ------------- PR: https://git.openjdk.org/jdk/pull/10501 From darcy at openjdk.org Mon Oct 3 20:37:11 2022 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 3 Oct 2022 20:37:11 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v8] In-Reply-To: References: Message-ID: > With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. > > Updates were made using a shell script. I"ll run a copyright updater before any push. Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Update accessibility URLs. - Merge branch 'master' into JDK-8294618 - Update make directory. - Update doc directory files. - Update hg URLs to git. - Merge branch 'master' into JDK-8294618 - http -> https - Undo manpage update. - Update copyright. - Revert unintended update to binary file. - ... and 1 more: https://git.openjdk.org/jdk/compare/7a8d31f3...4055f1a6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10501/files - new: https://git.openjdk.org/jdk/pull/10501/files/224ed7a0..4055f1a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=06-07 Stats: 5510 lines in 75 files changed: 3554 ins; 1669 del; 287 mod Patch: https://git.openjdk.org/jdk/pull/10501.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10501/head:pull/10501 PR: https://git.openjdk.org/jdk/pull/10501 From darcy at openjdk.org Mon Oct 3 20:37:14 2022 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 3 Oct 2022 20:37:14 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v7] In-Reply-To: References: <-_gZsyDFTlHjj-7UiLaIjMBikGCJU8M2Kz9D7dm-20I=.7eacd8cd-6f3c-4a9d-9bbb-18291146b58e@github.com> Message-ID: On Mon, 3 Oct 2022 20:04:38 GMT, Phil Race wrote: >> Joe Darcy has updated the pull request incrementally with one additional commit since the last revision: >> >> Update make directory. > > src/jdk.accessibility/windows/native/include/bridge/AccessBridgeCalls.h line 36: > >> 34: * https://git.openjdk.org/jdk/blob/master/src/jdk.accessibility/windows/native/bridge/AccessBridgeCalls.c >> 35: * >> 36: * Also note that the API is used in the jaccessinspector and jaccesswalker tools. > > The problem with this is, is that anyone who gets JDK 20 (or 21 the LTS) will be forever more then pointed at the OpenJDK "tip" and if we made an incompatible ABI change, that would be a problem. > At this point I'd prefer that this be updated to point to JDK 17, as in > https://github.com/openjdk/jdk17/blob/master/src/jdk.accessibility/windows/native/jaccesswalker/jaccesswalker.cpp > So it is a defined, known, compatible version. Updated to refer to JDK 17 specifically. ------------- PR: https://git.openjdk.org/jdk/pull/10501 From dholmes at openjdk.org Tue Oct 4 04:35:11 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Oct 2022 04:35:11 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 12:59:21 GMT, Robbin Ehn wrote: > Hi, please consider. > > Yes it should be JavaThread*. > But some additional places needed to use JavaThread* > > Passes t1-3. Changes look fine, but some additional changes needed in `get_java_tid`. Thanks. src/hotspot/share/runtime/sharedRuntime.cpp line 1004: > 1002: return 0; > 1003: } > 1004: guarantee(Thread::current() != thread || JavaThread::cast(thread)->is_oop_safe(), You don't need the cast any more. src/hotspot/share/runtime/sharedRuntime.cpp line 1006: > 1004: guarantee(Thread::current() != thread || JavaThread::cast(thread)->is_oop_safe(), > 1005: "current cannot touch oops after its GC barrier is detached."); > 1006: oop obj = JavaThread::cast(thread)->threadObj(); Again no cast needed. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10532 From dholmes at openjdk.org Tue Oct 4 05:47:58 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Oct 2022 05:47:58 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 13:37:38 GMT, Julian Waters wrote: >> A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Naming > JNU_ThrowByNameWithStrerror > JNU_ThrowByNamePerror > JNU_ThrowIOExceptionWithIOError The problem with this kind of naming is that the user of the API for which these functions are then called, should not have to know anything about the origins of the actual error code/string. The call-sites really do want to say `ThrowXXXWithLastError`. Maybe I'm wrong to believe that they can be ignorant of the the details but the level of abstraction seems wrong to me. ------------- PR: https://git.openjdk.org/jdk/pull/9870 From alanb at openjdk.org Tue Oct 4 06:51:20 2022 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 4 Oct 2022 06:51:20 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 13:37:38 GMT, Julian Waters wrote: >> A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Naming I skimmed through the latest version. I think most of the changes at the use-sites need to be reverted as they are just too confusing and error prone. Maybe this PR should be changed back to draft until there is some agreement on this topic? ------------- PR: https://git.openjdk.org/jdk/pull/9870 From jbhateja at openjdk.org Tue Oct 4 06:52:56 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Oct 2022 06:52:56 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11] In-Reply-To: References: <_Ghl2lsnrBhiWvVD3TMiwGo6SfQLl6idczb1QVqLa_I=.7cfa48e2-2987-43e0-a689-0e3462e4d270@github.com> Message-ID: On Mon, 3 Oct 2022 17:47:00 GMT, Smita Kamath wrote: >> Rethink about it, you can get 0x01 by right shifting k0 to the right - `kshiftrw(ktmp, k0, 15)` > > @merykitty Thanks for the suggestion. I will update the instruct to use kmovwl. I will also experiment with kshiftrw and let you know. > You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks. Yes, in general all AVX512VL targets support AVX512BW, but cloud instances give freedom to enable custom features. Regarding K0, as per section "15.6.1.1" of SDM, expectation is that K0 can appear in source and destination of regular non predication context, k0 should always contain all true mask so it should be unmodifiable for subsequent usages i.e. should not be present as destination of a mask manipulating instruction. Your suggestion is to have that in source but it may not work either. Changing existing sequence to use kmovw and replace AVX512BW with AVX512VL will again mean introducing an additional predication check for this pattern. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From dholmes at openjdk.org Tue Oct 4 07:00:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 4 Oct 2022 07:00:04 GMT Subject: RFR: 8294580: frame::interpreter_frame_print_on() crashes if free BasicObjectLock exists in frame In-Reply-To: References: Message-ID: On Thu, 29 Sep 2022 12:49:27 GMT, Richard Reingruber wrote: > Add null check before dereferencing BasicObjectLock::_obj. > BasicObjectLocks are marked as free by setting _obj to null. > > I've done manual testing: > > > ./images/jdk/bin/java -Xlog:continuations=trace -XX:+VerifyContinuations --enable-preview VTSleepAfterUnlock > > > with the test attached to the JBS item. > > Example output: > > > [0.349s][trace][continuations] Interpreted frame (sp=0x000000011d5c6398 unextended sp=0x000000011d5c63b8, fp=0x000000011d5c6420, real_fp=0x000000011d5c6420, pc=0x00007f0ff0199c6a) > [0.349s][trace][continuations] ~return entry points [0x00007f0ff0199820, 0x00007f0ff019a2e8] 2760 bytes > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #0 > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #1 > [0.349s][trace][continuations] - local [0x0000000000000000]; #2 > [0.349s][trace][continuations] - stack [0x0000000000000064]; #1 > [0.349s][trace][continuations] - stack [0x0000000000000000]; #0 > [0.349s][trace][continuations] - obj [null] > [0.349s][trace][continuations] - lock [monitor mark(is_neutral no_hash age=0)] > [0.349s][trace][continuations] - monitor[0x000000011d5c63d8] > [0.349s][trace][continuations] - bcp [0x00007f0fa8400401]; @17 > [0.349s][trace][continuations] - locals [0x000000011d5c6440] > [0.349s][trace][continuations] - method [0x00007f0fa8400430]; virtual void VTSleepAfterUnlock.sleepAfterUnlock() Seems quite reasonable. I'm guessing we now print frames in different contexts to what we used to and so now find unlocked BasicObjectLocks. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10486 From rehn at openjdk.org Tue Oct 4 08:50:57 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Oct 2022 08:50:57 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 04:23:19 GMT, David Holmes wrote: >> Hi, please consider. >> >> Yes it should be JavaThread*. >> But some additional places needed to use JavaThread* >> >> Passes t1-3. > > src/hotspot/share/runtime/sharedRuntime.cpp line 1004: > >> 1002: return 0; >> 1003: } >> 1004: guarantee(Thread::current() != thread || JavaThread::cast(thread)->is_oop_safe(), > > You don't need the cast any more. Thanks, fixing! > src/hotspot/share/runtime/sharedRuntime.cpp line 1006: > >> 1004: guarantee(Thread::current() != thread || JavaThread::cast(thread)->is_oop_safe(), >> 1005: "current cannot touch oops after its GC barrier is detached."); >> 1006: oop obj = JavaThread::cast(thread)->threadObj(); > > Again no cast needed. Thanks, fixing! ------------- PR: https://git.openjdk.org/jdk/pull/10532 From qamai at openjdk.org Tue Oct 4 09:11:28 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 4 Oct 2022 09:11:28 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11] In-Reply-To: References: <_Ghl2lsnrBhiWvVD3TMiwGo6SfQLl6idczb1QVqLa_I=.7cfa48e2-2987-43e0-a689-0e3462e4d270@github.com> Message-ID: On Tue, 4 Oct 2022 06:49:53 GMT, Jatin Bhateja wrote: >> @merykitty Thanks for the suggestion. I will update the instruct to use kmovwl. I will also experiment with kshiftrw and let you know. > >> You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks. > > Yes, in general all AVX512VL targets support AVX512BW, but cloud instances give freedom to enable custom features. Regarding K0, as per section "15.6.1.1" of SDM, expectation is that K0 can appear in source and destination of regular non predication context, k0 should always contain all true mask so it should be unmodifiable for subsequent usages i.e. should not be present as destination of a mask manipulating instruction. Your suggestion is to have that in source but it may not work either. Changing existing sequence to use kmovw and replace AVX512BW with AVX512VL will again mean introducing an additional predication check for this pattern. Ah I get it, the encoding of k0 is treated specially in predicated instructions to refer to an all-set mask, but the register itself may not actually contain that value. So usage in `kshiftrw` may fail. In that case I think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp)` to save a GPR in this occasion. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From adinn at redhat.com Tue Oct 4 10:33:38 2022 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 4 Oct 2022 11:33:38 +0100 Subject: RFC: linux-aarch64 and LSE support In-Reply-To: <9E71444D-98A2-4179-A618-54027E6BC73B@oracle.com> References: <7BD2A887-C204-4A36-8F2E-FA7386C17E2D@oracle.com> <28764c2d-6963-98ca-1212-b296d649e513@littlepinkcloud.com> <268A90AB-A4F1-44CB-A027-C07A12F9209B@oracle.com> <416c7338-2dff-7997-c95f-c9e1c74de180@littlepinkcloud.com> <0958C022-824B-4492-9D82-393F64965D5E@oracle.com> <246e397f-2f10-84fa-19d7-65b1472606e9@littlepinkcloud.com> <4a767b25-1539-183a-e4cd-4852f633e77d@littlepinkcloud.com> <9E71444D-98A2-4179-A618-54027E6BC73B@oracle.com> Message-ID: <2658fcca-a01c-ae55-161c-af5d08977109@redhat.com> On 03/10/2022 20:16, Kim Barrett wrote: > My question is not whether using an acq-rel operation is sufficient to remove > the need for a preceding fence, it's whether that's *necessary*? Specifically, > is a release operation also sufficient, as discussed in the kernel patch? > https://patchwork.kernel.org/patch/3575821/ Andrew is on PTO at the moment so I'll respond for now. He can (and will :-) correct me later if needed. I don't believe any of Andrew's comments in this thread are questioning Will Deacon's statements regarding cases where the preceding fence can or cannot be dropped. I read them as all being about the need for a trailing fence. The issue with AArch64 has always been that a releasing store has the potential, per the original spec, to be asymmetric in the way it orders visibility. Obviously, the spec guarantees visibility of all stores preceding the releasing store in program order before the releasing store itself becomes visible. It does not imply any visibility ordering guarantee for stores that follow the releasing store in program order. Hence the need for a DMB. This asymmetry is something that often surprises those coming to AArch64 from a TSO architecture like x86. As Andrew mentioned, a recent spec change means that the situation is now different when the releasing store is also acquiring and is an atomic op. In that specific case, the spec change means that the op orders visibility of its store wrt both (program order) preceding and (program order) following stores. > That kernel patch argues that for ll/sc atomics only a release operation is > needed (except for cmpxchg). And that still seems to hold for the linux > kernel - it uses ldxr/stlxr with a trailing dmb ish. Are you claiming > otherwise? (For some reason 8261027 used ldaxr rather than ldxr in the new .S > file, without any justification for that change.) I didn't read anything he said as claiming otherwise. I assume the use of ldaxr was unintended. My view is that it ought not to be needed. > We also need to understand what forms can/should be used for LSE. (Maybe a > release operation with trailing fence works there too? But let's not go > there.) I had much the same reaction as Dean about this, which your reply to > him seems to agree with. Specifically, we only need an acq-rel LSE to get the > behavior we want from LSE atomics. Yes, this seems to be what the spec now guarantees. > So I think that for memory_order_conservative we want to use: > > 1. For ll/sc cmpxchg: ldxr/stxr with leading and trailing dmb ish, e.g. a > Relaxed operation with both leading and trailing fences. Looks right to me. > 2. For ll/sc non-cmpxchg: ldxr/stlxr with trailing dmb ish, e.g. a Release > operation with a trailing fence. Also looks right to me. > 3. For LSE: an acq-rel instruction for all operations. Also looks right to me. > That looks like what the linux kernel is using. It also mostly agrees with > a recent change to gcc outline-atomics support: > > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bc25483c055d62f94f8c289f80843dda3c4a6ff4 > 2022-05-13 Sebastian Pop > PR target/105162 > (included in gcc12.2) > > Unfortunately, that change doesn't implement ll/sc cmpxchg the way we want > (item 1), instead using ldxr/stlxr+dmb. That's wrong, according to the > afore-referenced linux kernel patch (an analysis I agree with). (There is also > the problem of the __sync suite not having something for Atomic::xchg.) Hmm, that is ... unfortunate :-/ > So how do we get the code we want? There doesn't seem to be a way for us to > use gcc intrinsics. The "legacy" __sync_xxx operations are documented as > "full barriers" as we want, and the above referenced change comes pretty > close. But we couldn't use that right now even if that change perfectly > matched what we want, as it is far too new. (After digging into some of this > I have a great deal of sympathy for ErikO's position here: > https://mail.openjdk.org/pipermail/hotspot-dev/2019-November/039931.html) Yeah, Erik's point was always quite telling and it definitely seems to bite here. > So I think we are (at least for now) stuck with rolling our own. What we have > has some problems, and I think can be improved in various ways (for example, I > think it is possible to dispense with runtime stub generation). I'm planning > to offer some PRs along those lines. Ok, thanks for the update. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From rehn at openjdk.org Tue Oct 4 11:22:49 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 4 Oct 2022 11:22:49 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* [v2] In-Reply-To: References: Message-ID: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> > Hi, please consider. > > Yes it should be JavaThread*. > But some additional places needed to use JavaThread* > > Passes t1-3. Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Fixed review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10532/files - new: https://git.openjdk.org/jdk/pull/10532/files/17936e3d..a24db865 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10532&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10532&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10532.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10532/head:pull/10532 PR: https://git.openjdk.org/jdk/pull/10532 From rrich at openjdk.org Tue Oct 4 11:52:11 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 4 Oct 2022 11:52:11 GMT Subject: RFR: 8294580: frame::interpreter_frame_print_on() crashes if free BasicObjectLock exists in frame In-Reply-To: References: Message-ID: On Thu, 29 Sep 2022 12:49:27 GMT, Richard Reingruber wrote: > Add null check before dereferencing BasicObjectLock::_obj. > BasicObjectLocks are marked as free by setting _obj to null. > > I've done manual testing: > > > ./images/jdk/bin/java -Xlog:continuations=trace -XX:+VerifyContinuations --enable-preview VTSleepAfterUnlock > > > with the test attached to the JBS item. > > Example output: > > > [0.349s][trace][continuations] Interpreted frame (sp=0x000000011d5c6398 unextended sp=0x000000011d5c63b8, fp=0x000000011d5c6420, real_fp=0x000000011d5c6420, pc=0x00007f0ff0199c6a) > [0.349s][trace][continuations] ~return entry points [0x00007f0ff0199820, 0x00007f0ff019a2e8] 2760 bytes > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #0 > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #1 > [0.349s][trace][continuations] - local [0x0000000000000000]; #2 > [0.349s][trace][continuations] - stack [0x0000000000000064]; #1 > [0.349s][trace][continuations] - stack [0x0000000000000000]; #0 > [0.349s][trace][continuations] - obj [null] > [0.349s][trace][continuations] - lock [monitor mark(is_neutral no_hash age=0)] > [0.349s][trace][continuations] - monitor[0x000000011d5c63d8] > [0.349s][trace][continuations] - bcp [0x00007f0fa8400401]; @17 > [0.349s][trace][continuations] - locals [0x000000011d5c6440] > [0.349s][trace][continuations] - method [0x00007f0fa8400430]; virtual void VTSleepAfterUnlock.sleepAfterUnlock() Hi David, > Seems quite reasonable. I'm guessing we now print frames in different contexts to what we used to and so now find unlocked BasicObjectLocks. The fix for JDK-8290718 changed `VerifyStackChunkFrameClosure::do_frame()` to call `StackChunkFrameStream::print_on()`which delegates to `frame::interpreter_frame_print_on()`. Before that `AllocatedObj::print_value_on()` was called (see https://github.com/openjdk/jdk/commit/f714ac52bfe95b5a94e3994656438ef2aeab2c86#diff-a8b7bc88a1deed1885629c925d53059a9835f58ae29ec4bce7503d31e1029495). Thanks for the review, Richard. ------------- PR: https://git.openjdk.org/jdk/pull/10486 From aph-open at littlepinkcloud.com Tue Oct 4 12:24:21 2022 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 4 Oct 2022 13:24:21 +0100 Subject: RFC: linux-aarch64 and LSE support In-Reply-To: <2658fcca-a01c-ae55-161c-af5d08977109@redhat.com> References: <7BD2A887-C204-4A36-8F2E-FA7386C17E2D@oracle.com> <28764c2d-6963-98ca-1212-b296d649e513@littlepinkcloud.com> <268A90AB-A4F1-44CB-A027-C07A12F9209B@oracle.com> <416c7338-2dff-7997-c95f-c9e1c74de180@littlepinkcloud.com> <0958C022-824B-4492-9D82-393F64965D5E@oracle.com> <246e397f-2f10-84fa-19d7-65b1472606e9@littlepinkcloud.com> <4a767b25-1539-183a-e4cd-4852f633e77d@littlepinkcloud.com> <9E71444D-98A2-4179-A618-54027E6BC73B@oracle.com> <2658fcca-a01c-ae55-161c-af5d08977109@redhat.com> Message-ID: Just a couple of things... On 10/4/22 11:33, Andrew Dinn wrote: > On 03/10/2022 20:16, Kim Barrett wrote: ... >> So I think that for memory_order_conservative we want to use: >> >> 1. For ll/sc cmpxchg: ldxr/stxr with leading and trailing dmb ish, e.g. a >> Relaxed operation with both leading and trailing fences. > > Looks right to me. > >> 2. For ll/sc non-cmpxchg: ldxr/stlxr with trailing dmb ish, e.g. a Release >> operation with a trailing fence. > > Also looks right to me. > >> 3. For LSE: an acq-rel instruction for all operations. > > Also looks right to me. > >> That looks like what the linux kernel is using. It also mostly agrees with >> a recent change to gcc outline-atomics support: >> >> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=bc25483c055d62f94f8c289f80843dda3c4a6ff4 >> 2022-05-13 Sebastian Pop >> PR target/105162 >> (included in gcc12.2) >> >> Unfortunately, that change doesn't implement ll/sc cmpxchg the way we want >> (item 1), instead using ldxr/stlxr+dmb. That's wrong, It's not wrong, but it is different: GCC's CAS atomics have never guaranteed ordering if the CAS fails. In other words, it's not possible to synchronize with a store that did not take place. >> according to the >> afore-referenced linux kernel patch (an analysis I agree with). (There is also >> the problem of the __sync suite not having something for Atomic::xchg.) Huh? __atomic_exchange_n() . >> So how do we get the code we want? There doesn't seem to be a way for us to >> use gcc intrinsics. One way to solve this is, I suspect, to realize that ldxr/stxr is only used when LSE is not available, and all contemporary AArch64 implementations support LSE. Therefore, ldxr/stxr is legacy only, and it barely matters if it's somewhat suboptimal. So, maybe all we have to do is use GCC's operations and throw in a DMB ISH after __atomic ops, when needed. To do that we could use outline atomics and a if (!LSE) __sync_synchronize(); or with a function pointer (*maybe_dmb)(); in the case of memory_order_conservative. That should add a single well- predicted branch. We could benchmark that and see if it'll do. We could also do something like this: if (CAS failed) __sync_synchronize(); in the non_LSE case. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jwaters at openjdk.org Tue Oct 4 12:29:29 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 12:29:29 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 05:44:27 GMT, David Holmes wrote: > > JNU_ThrowByNameWithStrerror > > JNU_ThrowByNamePerror > > JNU_ThrowIOExceptionWithIOError > > The problem with this kind of naming is that the user of the API for which these functions are then called, should not have to know anything about the origins of the actual error code/string. The call-sites really do want to say `ThrowXXXWithLastError`. That's a good point for the first 2, but I do feel like it would be helpful to specify the kind of error the utilities that are more specialized (NET_ThrowNewWithLastError and JNU_ThrowIOExceptionWithLastError for instance), since both use very different subsystems on Windows and POSIX, which means they can be segregated accordingly. In my opinion the specific error type could be specified after "Last" instead of replacing it as a compromise. The tricky part comes with the first 2 (JNU_ThrowByNameWithLastError and JNU_ThrowByNameWithMessageAndLastError), which are for general use, but their original implementations fell back to the initial problematic mixing of WIN32 and errno errors, and leaving their names unchanged might just result in more ambiguity at the callsites they're used at. I also don't think I came up with particularly good names for them though, and honestly I'm at a bit of a loss as to what should be done with them at this point, hopefully more reviews can come in wit h some insight on this end. > Maybe I'm wrong to believe that they can be ignorant of the details but the level of abstraction seems wrong to me. The issue with that was Thomas's initial concerns that you can't really use the last error like this without at least some knowledge about the origin and nature of the actual error itself (WIN32 completely bypassing errno on Windows, and APIs on POSIX arbitrarily using it sometimes but using a return value instead to signal an error other times). I did try to address that by making it a little more specific with this patch, but it still seems there's a better way to do it than this... I'll try to address the other reviews in the meantime though, before coming back to this. ------------- PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Tue Oct 4 12:31:02 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 12:31:02 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v6] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'openjdk:master' into patch-4 - Merge branch 'openjdk:master' into patch-4 - Back out change to DLL_ERROR4 for separate RFE - Missing spacing between errors - Introduce warning when system error cannot be determined - LoadLibrary checks should explicitly use NULL - Dump error (if any) when libraries fail to load - Prettify DLL_ERROR4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/284ac2f4..93306172 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=04-05 Stats: 19121 lines in 447 files changed: 10613 ins; 6888 del; 1620 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Tue Oct 4 12:33:20 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 12:33:20 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v7] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Update java_md.h ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/93306172..9cb28eab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=05-06 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Tue Oct 4 12:44:28 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 12:44:28 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v8] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Update java_md.c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/9cb28eab..2f3e12ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=06-07 Stats: 74 lines in 1 file changed: 70 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Tue Oct 4 12:46:50 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 12:46:50 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v9] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert changes to JLI_ReportErrorMessageSys ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/2f3e12ce..a3762867 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=07-08 Stats: 10 lines in 1 file changed: 6 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Tue Oct 4 12:50:22 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 12:50:22 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v10] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Make DLL_ERROR4 look a little better without changing what it means ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/a3762867..0a0b56eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Tue Oct 4 13:03:26 2022 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 4 Oct 2022 13:03:26 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v10] In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 12:50:22 GMT, Julian Waters wrote: >> Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. >> >> See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Make DLL_ERROR4 look a little better without changing what it means Results from usage in JDK-8288293 Failure to load the JVM (before): `Error: loading: D:\Eclipse\Workspace\HotSpot\jdk\build\windows-x86_64-server-release\jdk\bin\server\jvm.dll` Failure to load the JVM (after): `Error: Could not load D:\Eclipse\Workspace\HotSpot\jdk\build\windows-x86_64-server-release\jdk\bin\server\jvm.dll: The specified module could not be found` ------------- PR: https://git.openjdk.org/jdk/pull/9749 From cjplummer at openjdk.org Tue Oct 4 18:23:32 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 4 Oct 2022 18:23:32 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: References: Message-ID: On Wed, 28 Sep 2022 15:32:11 GMT, Michael Ernst wrote: > The title was edited by someone other than me, as you can see from the PR history. The PR title needs to match the CR synopsis, so update the CR first, and then update the PR. ------------- PR: https://git.openjdk.org/jdk/pull/10029 From prr at openjdk.org Tue Oct 4 18:42:44 2022 From: prr at openjdk.org (Phil Race) Date: Tue, 4 Oct 2022 18:42:44 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v8] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 20:37:11 GMT, Joe Darcy wrote: >> With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. >> >> Updates were made using a shell script. I"ll run a copyright updater before any push. > > Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Update accessibility URLs. > - Merge branch 'master' into JDK-8294618 > - Update make directory. > - Update doc directory files. > - Update hg URLs to git. > - Merge branch 'master' into JDK-8294618 > - http -> https > - Undo manpage update. > - Update copyright. > - Revert unintended update to binary file. > - ... and 1 more: https://git.openjdk.org/jdk/compare/571e4932...4055f1a6 Marked as reviewed by prr (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10501 From dholmes at openjdk.org Wed Oct 5 01:47:27 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 5 Oct 2022 01:47:27 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* [v2] In-Reply-To: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> References: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> Message-ID: On Tue, 4 Oct 2022 11:22:49 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Yes it should be JavaThread*. >> But some additional places needed to use JavaThread* >> >> Passes t1-3. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed review comments Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10532 From rehn at openjdk.org Wed Oct 5 06:14:06 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Oct 2022 06:14:06 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* [v2] In-Reply-To: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> References: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> Message-ID: On Tue, 4 Oct 2022 11:22:49 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Yes it should be JavaThread*. >> But some additional places needed to use JavaThread* >> >> Passes t1-3. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed review comments Thanks David! ------------- PR: https://git.openjdk.org/jdk/pull/10532 From ihse at openjdk.org Wed Oct 5 07:43:19 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 5 Oct 2022 07:43:19 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v8] In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 20:37:11 GMT, Joe Darcy wrote: >> With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. >> >> Updates were made using a shell script. I"ll run a copyright updater before any push. > > Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Update accessibility URLs. > - Merge branch 'master' into JDK-8294618 > - Update make directory. > - Update doc directory files. > - Update hg URLs to git. > - Merge branch 'master' into JDK-8294618 > - http -> https > - Undo manpage update. > - Update copyright. > - Revert unintended update to binary file. > - ... and 1 more: https://git.openjdk.org/jdk/compare/72d7bf5d...4055f1a6 Marked as reviewed by ihse (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10501 From jsjolen at openjdk.org Wed Oct 5 09:46:21 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 5 Oct 2022 09:46:21 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* [v2] In-Reply-To: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> References: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> Message-ID: On Tue, 4 Oct 2022 11:22:49 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Yes it should be JavaThread*. >> But some additional places needed to use JavaThread* >> >> Passes t1-3. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed review comments This looks good to me, thanks for the fix. ------------- Marked as reviewed by jsjolen (Author). PR: https://git.openjdk.org/jdk/pull/10532 From rehn at openjdk.org Wed Oct 5 09:53:16 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Oct 2022 09:53:16 GMT Subject: RFR: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* [v2] In-Reply-To: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> References: <55WCj3MD3F946fMzvklVUUDna-pe1BOU9aNHhzMk2AM=.7e7bb4fe-dc48-4857-b8c9-fa9f7d0baacd@github.com> Message-ID: On Tue, 4 Oct 2022 11:22:49 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Yes it should be JavaThread*. >> But some additional places needed to use JavaThread* >> >> Passes t1-3. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Fixed review comments Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10532 From jwaters at openjdk.org Wed Oct 5 11:00:16 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Oct 2022 11:00:16 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v11] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. > > See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'openjdk:master' into patch-4 - Make DLL_ERROR4 look a little better without changing what it means - Revert changes to JLI_ReportErrorMessageSys - Update java_md.c - Update java_md.h - Merge branch 'openjdk:master' into patch-4 - Merge branch 'openjdk:master' into patch-4 - Back out change to DLL_ERROR4 for separate RFE - Missing spacing between errors - Introduce warning when system error cannot be determined - ... and 3 more: https://git.openjdk.org/jdk/compare/32b83751...e7bef513 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/0a0b56eb..e7bef513 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=09-10 Stats: 837 lines in 37 files changed: 462 ins; 259 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Wed Oct 5 11:03:41 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Oct 2022 11:03:41 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v12] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. > > See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Use - instead of : as a separator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/e7bef513..fb62a9a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=10-11 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Wed Oct 5 11:03:44 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Oct 2022 11:03:44 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v11] In-Reply-To: References: Message-ID: <7-AzrxeV4UIRSEZSD9MeAc5zyGZdxtGVuvq4_5cutoc=.a7869d62-7e70-455d-9a38-8eab21052f36@github.com> On Wed, 5 Oct 2022 11:00:16 GMT, Julian Waters wrote: >> Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. >> >> See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-4 > - Make DLL_ERROR4 look a little better without changing what it means > - Revert changes to JLI_ReportErrorMessageSys > - Update java_md.c > - Update java_md.h > - Merge branch 'openjdk:master' into patch-4 > - Merge branch 'openjdk:master' into patch-4 > - Back out change to DLL_ERROR4 for separate RFE > - Missing spacing between errors > - Introduce warning when system error cannot be determined > - ... and 3 more: https://git.openjdk.org/jdk/compare/e9cb88a8...e7bef513 Minor change: From `Error: Could not load D:\Eclipse\Workspace\HotSpot\jdk\build\windows-x86_64-server-release\jdk\bin\server\jvm.dll: The specified module could not be found` to `Error: Could not load D:\Eclipse\Workspace\HotSpot\jdk\build\windows-x86_64-server-release\jdk\bin\server\jvm.dll - The specified module could not be found` ------------- PR: https://git.openjdk.org/jdk/pull/9749 From rehn at openjdk.org Wed Oct 5 12:48:37 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 5 Oct 2022 12:48:37 GMT Subject: Integrated: 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 12:59:21 GMT, Robbin Ehn wrote: > Hi, please consider. > > Yes it should be JavaThread*. > But some additional places needed to use JavaThread* > > Passes t1-3. This pull request has now been integrated. Changeset: 979efd41 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/979efd4174968802f0c170e768671507a11e118e Stats: 26 lines in 5 files changed: 3 ins; 3 del; 20 mod 8289004: investigate if SharedRuntime::get_java_tid parameter should be a JavaThread* Reviewed-by: dholmes, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/10532 From dmitry.samersoff at bell-sw.com Wed Oct 5 12:55:13 2022 From: dmitry.samersoff at bell-sw.com (Dmitry Samersoff) Date: Wed, 5 Oct 2022 15:55:13 +0300 Subject: Q: Should we use 64bit atomic in x86_64 patch_verified_entry code? Message-ID: Hello Everybody, I'm working on a crash that seems to be related to CMC[1] - the JVM crashes when a method become not re-entrant because a JavaThread executing a compiled method reaches an instruction partially-assembled during patching of verified entry point. In the void NativeJump::patch_verified_entry() we atomically patch first 4 bytes, then atomically patch 5th byte, then atomically patch first 4 bytes again. Is it better (from CMC point of view) to patch atomically 8 bytes at once? 1. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html -Dmitry -- Dmitry.Samersoff at bell-sw.com Technical Professional at BellSoft From aivanov at openjdk.org Wed Oct 5 13:27:29 2022 From: aivanov at openjdk.org (Alexey Ivanov) Date: Wed, 5 Oct 2022 13:27:29 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: References: Message-ID: On Mon, 26 Sep 2022 16:51:36 GMT, Michael Ernst wrote: >> 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni > > Michael Ernst has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Reinstate typos in Apache code that is copied into the JDK > - Merge ../jdk-openjdk into typos-typos > - Remove file that was removed upstream > - Fix inconsistency in capitalization > - Undo change in zlip > - Fix typos Changes requested by aivanov (Reviewer). src/hotspot/share/opto/memnode.cpp line 2365: > 2363: if (x != this) return x; > 2364: > 2365: // Take apart the address into an oop and offset. ?and _an_ offset?? src/java.xml/share/classes/org/w3c/dom/Document.java line 293: > 291: * systemId, and notationName attributes are > 292: * copied. If a deep import is requested, the descendants > 293: * of the source Entity are recursively imported and This class may come from a 3rd party library. Anyone from `java.xml` can confirm it? test/hotspot/jtreg/vmTestbase/nsk/share/locks/DeadlockMaker.java line 31: > 29: /* > 30: * Class used to create deadlocked threads. It is possible create 2 or more deadlocked thread, also > 31: * is possible to specify resource of which type should lock each deadlocked thread Suggestion: * it is possible to specify resource of which type should lock each deadlocked thread It doesn't sound right without _?it?_. test/jdk/com/sun/jdi/connect/spi/GeneratedConnectors.java line 28: > 26: * @summary Unit test for "Pluggable Connectors and Transports" feature. > 27: * > 28: * When a transport service is deployed the virtual machine Suggestion: * When a transport service is deployed, the virtual machine Let's add a comma for clarity. test/jdk/java/security/testlibrary/SimpleOCSPServer.java line 445: > 443: > 444: /** > 445: * Check the status database for revocation information on one or more Suggestion: * Check the status database for revocation information of one or more test/jdk/sun/jvmstat/testlibrary/utils.sh line 181: > 179: if [ $? -eq 0 ] > 180: then > 181: # it's still lingering, now it is hard Suggestion: # it's still lingering, now hit it hard ------------- PR: https://git.openjdk.org/jdk/pull/10029 From sjohanss at openjdk.org Wed Oct 5 13:28:24 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 5 Oct 2022 13:28:24 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3] In-Reply-To: References: Message-ID: On Fri, 23 Sep 2022 06:39:33 GMT, Kim Barrett wrote: >> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal >> 8155996: Improve concurrent refinement green zone control >> 8134303: Introduce -XX:-G1UseConcRefinement >> >> Please review this change to the control of concurrent refinement. >> >> This new controller takes a different approach to the problem, addressing a >> number of issues. >> >> The old controller used a multiple of the target number of cards to determine >> the range over which increasing numbers of refinement threads should be >> activated, and finally activating mutator refinement. This has a variety of >> problems. It doesn't account for the processing rate, the rate of new dirty >> cards, or the time available to perform the processing. This often leads to >> unnecessary spikes in the number of running refinement threads. It also tends >> to drive the pending number to the target quickly and keep it there, removing >> the benefit from having pending dirty cards filter out new cards for nearby >> writes. It can't delay and leave excess cards in the queue because it could >> be a long time before another buffer is enqueued. >> >> The old controller was triggered by mutator threads enqueing card buffers, >> when the number of cards in the queue exceeded a threshold near the target. >> This required a complex activation protocol between the mutators and the >> refinement threads. >> >> With the new controller there is a primary refinement thread that periodically >> estimates how many refinement threads need to be running to reach the target >> in time for the next GC, along with whether to also activate mutator >> refinement. If the primary thread stops running because it isn't currently >> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any >> involvement in the activation of refinement threads by mutator threads. >> >> The estimate of how many refinement threads are needed uses a prediction of >> time until the next GC, the number of buffered cards, the predicted rate of >> new dirty cards, and the predicted refinement rate. The number of running >> threads is adjusted based on these periodically performed estimates. >> >> This new approach allows more dirty cards to be left in the queue until late >> in the mutator phase, typically reducing the rate of new dirty cards, which >> reduces the amount of concurrent refinement work needed. >> >> It also smooths out the number of running refinement threads, eliminating the >> unnecessarily large spikes that are common with the old method. One benefit >> is that the number of refinement threads (lazily) allocated is often much >> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem >> described in JDK-8153225.) >> >> This change also provides a new method for calculating for the number of dirty >> cards that should be pending at the start of a GC. While this calculation is >> conceptually distinct from the thread control, the two were significanly >> intertwined in the old controller. Changing this calculation separately and >> first would have changed the behavior of the old controller in ways that might >> have introduced regressions. Changing it after the thread control was changed >> would have made it more difficult to test and measure the thread control in a >> desirable configuration. >> >> The old calculation had various problems that are described in JDK-8155996. >> In particular, it can get more or less stuck at low values, and is slow to >> respond to changes. >> >> The old controller provided a number of product options, none of which were >> very useful for real applications, and none of which are very applicable to >> the new controller. All of these are being obsoleted. >> >> -XX:-G1UseAdaptiveConcRefinement >> -XX:G1ConcRefinementGreenZone= >> -XX:G1ConcRefinementYellowZone= >> -XX:G1ConcRefinementRedZone= >> -XX:G1ConcRefinementThresholdStep= >> >> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed >> value for the target number of cards, though it is poorly named for that. >> >> A configuration that was useful for some kinds of debugging and testing was to >> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a >> very large value, effectively disabling concurrent refinement. To support >> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic >> option has been added (see JDK-8155996). >> >> The other options are meaningless for the new controller. >> >> Because of these option changes, a CSR and a release note need to accompany >> this change. >> >> Testing: >> mach5 tier1-6 >> various performance tests. >> local (linux-x64) tier1 with -XX:-G1UseConcRefinement >> >> Performance testing found no regressions, but also little or no improvement >> with default options, which was expected. With default options most of our >> performance tests do very little concurrent refinement. And even for those >> that do, while the old controller had a number of problems, the impact of >> those problems is small and hard to measure for most applications. >> >> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare >> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with >> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options >> held constant) showed a statistically significant improvement of about 4.5% >> for critical-jOPS. Using the changed controller, the difference between this >> configuration and the default is fairly small, while the baseline shows >> significant degradation with the more restrictive options. >> >> For all tests and configurations the new controller often creates many fewer >> refinement threads. > > Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: > > - wanted vs needed nomenclature > - remove several spurious "scan" > - delay => wait_time_ms Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10256 From aivanov at openjdk.org Wed Oct 5 14:17:13 2022 From: aivanov at openjdk.org (Alexey Ivanov) Date: Wed, 5 Oct 2022 14:17:13 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: References: Message-ID: On Mon, 26 Sep 2022 16:51:36 GMT, Michael Ernst wrote: >> 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni > > Michael Ernst has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Reinstate typos in Apache code that is copied into the JDK > - Merge ../jdk-openjdk into typos-typos > - Remove file that was removed upstream > - Fix inconsistency in capitalization > - Undo change in zlip > - Fix typos I agree with everyone who said the PR should be broken to smaller pieces so that it touches code / tests in one or two packages, modules. It would be easier to review, you would need to get an approval from reviewers in a one or two specific areas. At this time, this PR touches files in 11 areas according the number of labels which correspond to a specific mailing list where discussions for the area are held. ------------- PR: https://git.openjdk.org/jdk/pull/10029 From darcy at openjdk.org Wed Oct 5 16:39:32 2022 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 5 Oct 2022 16:39:32 GMT Subject: RFR: JDK-8294618: Update openjdk.java.net => openjdk.org [v9] In-Reply-To: References: Message-ID: > With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. > > Updates were made using a shell script. I"ll run a copyright updater before any push. Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into JDK-8294618 - Update accessibility URLs. - Merge branch 'master' into JDK-8294618 - Update make directory. - Update doc directory files. - Update hg URLs to git. - Merge branch 'master' into JDK-8294618 - http -> https - Undo manpage update. - Update copyright. - ... and 2 more: https://git.openjdk.org/jdk/compare/8aa24fec...eba2bd4b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10501/files - new: https://git.openjdk.org/jdk/pull/10501/files/4055f1a6..eba2bd4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10501&range=07-08 Stats: 1452 lines in 87 files changed: 912 ins; 312 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/10501.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10501/head:pull/10501 PR: https://git.openjdk.org/jdk/pull/10501 From jwaters at openjdk.org Wed Oct 5 16:40:16 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 5 Oct 2022 16:40:16 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v13] In-Reply-To: References: Message-ID: <8nmAhTC5tubNWCLn89kWO4hQaP9ILZvgkx1ZtqMS9yY=.c794b8f4-0e50-472b-9c9a-993a2b24d8d2@github.com> > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. > > See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'openjdk:master' into patch-4 - Use - instead of : as a separator - Merge branch 'openjdk:master' into patch-4 - Make DLL_ERROR4 look a little better without changing what it means - Revert changes to JLI_ReportErrorMessageSys - Update java_md.c - Update java_md.h - Merge branch 'openjdk:master' into patch-4 - Merge branch 'openjdk:master' into patch-4 - Back out change to DLL_ERROR4 for separate RFE - ... and 5 more: https://git.openjdk.org/jdk/compare/8365afd1...aadf6275 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/fb62a9a9..aadf6275 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=11-12 Stats: 131 lines in 21 files changed: 58 ins; 32 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From darcy at openjdk.org Wed Oct 5 16:52:27 2022 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 5 Oct 2022 16:52:27 GMT Subject: Integrated: JDK-8294618: Update openjdk.java.net => openjdk.org In-Reply-To: References: Message-ID: On Fri, 30 Sep 2022 00:33:57 GMT, Joe Darcy wrote: > With the domain change from openjdk.java.net to openjdk.org, references to URLs in the sources should be updated. > > Updates were made using a shell script. I"ll run a copyright updater before any push. This pull request has now been integrated. Changeset: 536c9a51 Author: Joe Darcy URL: https://git.openjdk.org/jdk/commit/536c9a512ea90d97a1ae5310453410d55db98bdd Stats: 128 lines in 45 files changed: 0 ins; 0 del; 128 mod 8294618: Update openjdk.java.net => openjdk.org Reviewed-by: mikael, iris, joehw, prr, ihse ------------- PR: https://git.openjdk.org/jdk/pull/10501 From sspitsyn at openjdk.org Wed Oct 5 22:57:38 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 5 Oct 2022 22:57:38 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread Message-ID: The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 A few tests are impacted by this fix: test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 The following test has been removed as non-relevant any more: ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` New negative test has been added instead: ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` All JVM TI and JPDA tests were used locally for verification. They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. Mach5 test runs on all platforms are TBD. ------------- Commit messages: - 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread Changes: https://git.openjdk.org/jdk/pull/10586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288387 Stats: 1198 lines in 12 files changed: 453 ins; 694 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/10586.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10586/head:pull/10586 PR: https://git.openjdk.org/jdk/pull/10586 From svkamath at openjdk.org Thu Oct 6 06:28:04 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 6 Oct 2022 06:28:04 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: References: Message-ID: > 8289552: Make intrinsic conversions between bit representations of half precision values and floats Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Updated instruct to use kmovw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9781/files - new: https://git.openjdk.org/jdk/pull/9781/files/69999ce4..a00c3ecd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9781&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9781&range=11-12 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9781.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9781/head:pull/9781 PR: https://git.openjdk.org/jdk/pull/9781 From svkamath at openjdk.org Thu Oct 6 06:28:06 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Thu, 6 Oct 2022 06:28:06 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11] In-Reply-To: References: <_Ghl2lsnrBhiWvVD3TMiwGo6SfQLl6idczb1QVqLa_I=.7cfa48e2-2987-43e0-a689-0e3462e4d270@github.com> Message-ID: On Tue, 4 Oct 2022 09:07:42 GMT, Quan Anh Mai wrote: >>> You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks. >> >> Yes, in general all AVX512VL targets support AVX512BW, but cloud instances give freedom to enable custom features. Regarding K0, as per section "15.6.1.1" of SDM, expectation is that K0 can appear in source and destination of regular non predication context, k0 should always contain all true mask so it should be unmodifiable for subsequent usages i.e. should not be present as destination of a mask manipulating instruction. Your suggestion is to have that in source but it may not work either. Changing existing sequence to use kmovw and replace AVX512BW with AVX512VL will again mean introducing an additional predication check for this pattern. > > Ah I get it, the encoding of k0 is treated specially in predicated instructions to refer to an all-set mask, but the register itself may not actually contain that value. So usage in `kshiftrw` may fail. In that case I think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp, ktmp)` to save a GPR in this occasion. Thanks. Hi @merykitty, I am seeing performance regression with kxnorw instruction. So I have updated the PR with kmovwl. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From dsamersoff at openjdk.org Thu Oct 6 07:20:09 2022 From: dsamersoff at openjdk.org (Dmitry Samersoff) Date: Thu, 6 Oct 2022 07:20:09 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread In-Reply-To: References: Message-ID: <7KcCEr8Byi3GKWeNrTJJedUNsjS_8e0V2-rymIVxVrM=.9025b65b-980c-44d3-8e4d-9e78c9d5d513@github.com> On Wed, 5 Oct 2022 22:49:20 GMT, Serguei Spitsyn wrote: > The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 > > A few tests are impacted by this fix: > > test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 > > > The following test has been removed as non-relevant any more: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` > > New negative test has been added instead: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` > > All JVM TI and JPDA tests were used locally for verification. > They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. > > Mach5 test runs on all platforms are TBD. src/hotspot/share/prims/jvmtiEnvBase.hpp line 180: > 178: JavaThread* current = JavaThread::current(); > 179: oop cur_obj = current->jvmti_vthread(); > 180: bool is_current = jt == current && (cur_obj == NULL || cur_obj == thr_obj); It might be better to restructure this "if" and check for jt==current before we ask for cur_obj, or at least add brackets. ------------- PR: https://git.openjdk.org/jdk/pull/10586 From rkennke at openjdk.org Thu Oct 6 07:47:02 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:02 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking Message-ID: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. This change enables to simplify (and speed-up!) a lot of code: - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR ### Benchmarks All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. #### DaCapo/AArch64 Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? benchmark | baseline | fast-locking | % | size -- | -- | -- | -- | -- avrora | 27859 | 27563 | 1.07% | large batik | 20786 | 20847 | -0.29% | large biojava | 27421 | 27334 | 0.32% | default eclipse | 59918 | 60522 | -1.00% | large fop | 3670 | 3678 | -0.22% | default graphchi | 2088 | 2060 | 1.36% | default h2 | 297391 | 291292 | 2.09% | huge jme | 8762 | 8877 | -1.30% | default jython | 18938 | 18878 | 0.32% | default luindex | 1339 | 1325 | 1.06% | default lusearch | 918 | 936 | -1.92% | default pmd | 58291 | 58423 | -0.23% | large sunflow | 32617 | 24961 | 30.67% | large tomcat | 25481 | 25992 | -1.97% | large tradebeans | 314640 | 311706 | 0.94% | huge tradesoap | 107473 | 110246 | -2.52% | huge xalan | 6047 | 5882 | 2.81% | default zxing | 970 | 926 | 4.75% | default #### DaCapo/x86_64 The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. benchmark | baseline | fast-Locking | % | size -- | -- | -- | -- | -- avrora | 127690 | 126749 | 0.74% | large batik | 12736 | 12641 | 0.75% | large biojava | 15423 | 15404 | 0.12% | default eclipse | 41174 | 41498 | -0.78% | large fop | 2184 | 2172 | 0.55% | default graphchi | 1579 | 1560 | 1.22% | default h2 | 227614 | 230040 | -1.05% | huge jme | 8591 | 8398 | 2.30% | default jython | 13473 | 13356 | 0.88% | default luindex | 824 | 813 | 1.35% | default lusearch | 962 | 968 | -0.62% | default pmd | 40827 | 39654 | 2.96% | large sunflow | 53362 | 43475 | 22.74% | large tomcat | 27549 | 28029 | -1.71% | large tradebeans | 190757 | 190994 | -0.12% | huge tradesoap | 68099 | 67934 | 0.24% | huge xalan | 7969 | 8178 | -2.56% | default zxing | 1176 | 1148 | 2.44% | default #### Renaissance/AArch64 This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. benchmark | baseline | fast-locking | % -- | -- | -- | -- AkkaUct | 2558.832 | 2513.594 | 1.80% Reactors | 14715.626 | 14311.246 | 2.83% Als | 1851.485 | 1869.622 | -0.97% ChiSquare | 1007.788 | 1003.165 | 0.46% GaussMix | 1157.491 | 1149.969 | 0.65% LogRegression | 717.772 | 733.576 | -2.15% MovieLens | 7916.181 | 8002.226 | -1.08% NaiveBayes | 395.296 | 386.611 | 2.25% PageRank | 4294.939 | 4346.333 | -1.18% FjKmeans | 519.2 | 498.357 | 4.18% FutureGenetic | 2578.504 | 2589.255 | -0.42% Mnemonics | 4898.886 | 4903.689 | -0.10% ParMnemonics | 4260.507 | 4210.121 | 1.20% Scrabble | 139.37 | 138.312 | 0.76% RxScrabble | 320.114 | 322.651 | -0.79% Dotty | 1056.543 | 1068.492 | -1.12% ScalaDoku | 3443.117 | 3449.477 | -0.18% ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% FinagleChirper | 6814.192 | 6853.38 | -0.57% FinagleHttp | 4762.902 | 4807.564 | -0.93% #### Renaissance/x86_64 benchmark | baseline | fast-locking | % -- | -- | -- | -- AkkaUct | 1117.185 | 1116.425 | 0.07% Reactors | 11561.354 | 11812.499 | -2.13% Als | 1580.838 | 1575.318 | 0.35% ChiSquare | 459.601 | 467.109 | -1.61% GaussMix | 705.944 | 685.595 | 2.97% LogRegression | 659.944 | 656.428 | 0.54% MovieLens | 7434.303 | 7592.271 | -2.08% NaiveBayes | 413.482 | 417.369 | -0.93% PageRank | 3259.233 | 3276.589 | -0.53% FjKmeans | 946.429 | 938.991 | 0.79% FutureGenetic | 1760.672 | 1815.272 | -3.01% Scrabble | 147.996 | 150.084 | -1.39% RxScrabble | 177.755 | 177.956 | -0.11% Dotty | 673.754 | 683.919 | -1.49% ScalaKmeans | 165.376 | 168.925 | -2.10% ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. ### Testing - [x] tier1 (x86_64, aarch64, x86_32) - [x] tier2 (x86_64, aarch64) - [x] tier3 (x86_64, aarch64) - [x] tier4 (x86_64, aarch64) ------------- Commit messages: - Merge tag 'jdk-20+17' into fast-locking - Fix OSR packing in AArch64, part 2 - Fix OSR packing in AArch64 - Merge remote-tracking branch 'upstream/master' into fast-locking - Fix register in interpreter unlock x86_32 - Support unstructured locking in interpreter (x86 parts) - Support unstructured locking in interpreter (aarch64 and shared parts) - Merge branch 'master' into fast-locking - Merge branch 'master' into fast-locking - Added test for hand-over-hand locking - ... and 17 more: https://git.openjdk.org/jdk/compare/79ccc791...3ed51053 Changes: https://git.openjdk.org/jdk/pull/9680/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9680&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291555 Stats: 3660 lines in 127 files changed: 650 ins; 2481 del; 529 mod Patch: https://git.openjdk.org/jdk/pull/9680.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9680/head:pull/9680 PR: https://git.openjdk.org/jdk/pull/9680 From stuefe at openjdk.org Thu Oct 6 07:47:02 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Oct 2022 07:47:02 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) When I run renaissance philosophers benchmark (no arguments, just the default settings) on my 12 core machine the VM intermittently hangs after the benchmark is done. Always, two threads keep running at 100% CPU. I have been able to attach gdb once and we were in a tight loop in (gdb) bt #0 Atomic::PlatformLoad<8ul>::operator() (dest=0x7f991c119e80, this=) at src/hotspot/share/runtime/atomic.hpp:614 #1 Atomic::LoadImpl, void>::operator() (dest=0x7f991c119e80, this=) at src/hotspot/share/runtime/atomic.hpp:392 #2 Atomic::load (dest=0x7f991c119e80) at src/hotspot/share/runtime/atomic.hpp:615 #3 ObjectMonitor::owner_raw (this=0x7f991c119e40) at src/hotspot/share/runtime/objectMonitor.inline.hpp:66 #4 ObjectMonitor::owner (this=0x7f991c119e40) at src/hotspot/share/runtime/objectMonitor.inline.hpp:61 #5 ObjectSynchronizer::monitors_iterate (thread=0x7f9a30027230, closure=) at src/hotspot/share/runtime/synchronizer.cpp:983 #6 ObjectSynchronizer::release_monitors_owned_by_thread (current=current at entry=0x7f9a30027230) at src/hotspot/share/runtime/synchronizer.cpp:1492 #7 0x00007f9a351bc320 in JavaThread::exit (this=this at entry=0x7f9a30027230, destroy_vm=destroy_vm at entry=false, exit_type=exit_type at entry=JavaThread::jni_detach) at src/hotspot/share/runtime/javaThread.cpp:851 #8 0x00007f9a352445ca in jni_DetachCurrentThread (vm=) at src/hotspot/share/prims/jni.cpp:3962 #9 0x00007f9a35f9ac7e in JavaMain (_args=) at src/java.base/share/native/libjli/java.c:555 #10 0x00007f9a35f9e30d in ThreadJavaMain (args=) at src/java.base/unix/native/libjli/java_md.c:650 #11 0x00007f9a35d47609 in start_thread (arg=) at pthread_create.c:477 #12 0x00007f9a35ea3133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 in one thread. Which points to a misformed monitor list. I tried to reproduce it with a debug build, but no such luck. I was able to reproduce it once again with a release build. I'll see if I can find out more. Happens when the main thread detaches itself upon VM exit. VM attempts to release OMs that are owned by the finished main thread (side note: if that is the sole surviving thread, maybe that step could be skipped?). That happens before DestroyVM, so OM final audit did not yet run. Problem here is the OM in use list is circular (and very big, ca 11mio entries). I was able to reproduce it with a fastdebug build in 1 out of 5-6 runs. Also with less benchmark cycles (-r 3). Offlist questions from Roman: -"Does it really not happen with Stock?" no, I could not reproduce it with stock VM (built from f5d1b5bda27c798347ae278cbf69725ed4be895c, the commit preceding the PR) -"Do we now have more OMs than before?" I cannot see that effect. Running philosophers with -r 3 causes the VM in the end to have between 800k and ~2mio open OMs *if the error does not happen*, no difference between stock and PR VM. In cases where the PR-VM hangs we have a lot more, as I wrote, about 11-12mio OMs. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:03 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:03 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Wed, 3 Aug 2022 07:17:51 GMT, Thomas Stuefe wrote: > Happens when the main thread detaches itself upon VM exit. VM attempts to release OMs that are owned by the finished main thread (side note: if that is the sole surviving thread, maybe that step could be skipped?). That happens before DestroyVM, so OM final audit did not yet run. > > Problem here is the OM in use list is circular (and very big, ca 11mio entries). > > I was able to reproduce it with a fastdebug build in 1 out of 5-6 runs. Also with less benchmark cycles (-r 3). Hi Thomas, thanks for testing and reporting the issue. I just pushed an improvement (and simplification) of the monitor-enter-inflate path, and cannot seem to reproduce the problem anymore. Can you please try again with the latest change? ------------- PR: https://git.openjdk.org/jdk/pull/9680 From stuefe at openjdk.org Thu Oct 6 07:47:04 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Oct 2022 07:47:04 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: <87vYXa_Uu88sb8ldFeGdHfeqPCMPxhhzqbVOooXle7A=.09d21ecc-8910-464d-b164-88b8322ebd34@github.com> On Sun, 7 Aug 2022 12:50:01 GMT, Roman Kennke wrote: > > Happens when the main thread detaches itself upon VM exit. VM attempts to release OMs that are owned by the finished main thread (side note: if that is the sole surviving thread, maybe that step could be skipped?). That happens before DestroyVM, so OM final audit did not yet run. > > Problem here is the OM in use list is circular (and very big, ca 11mio entries). > > I was able to reproduce it with a fastdebug build in 1 out of 5-6 runs. Also with less benchmark cycles (-r 3). > > Hi Thomas, thanks for testing and reporting the issue. I just pushed an improvement (and simplification) of the monitor-enter-inflate path, and cannot seem to reproduce the problem anymore. Can you please try again with the latest change? New version ran for 30 mins without crashing. Not a solid proof, but its better :-) ------------- PR: https://git.openjdk.org/jdk/pull/9680 From dholmes at openjdk.org Thu Oct 6 07:47:05 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Oct 2022 07:47:05 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) The bar for acceptance for a brand new locking scheme with no fallback is extremely high and needs a lot of bake time and broad performance measurements, to watch for pathologies. That bar is lower if the scheme can be reverted to the old code if needed; and even lower still if the scheme is opt-in in the first place. For Java Object Monitors I made the new mechanism opt-in so the same could be done here. Granted it is not a trivial effort to do that, but I think a phased approach to transition to the new scheme is essential. It could be implemented as an experimental feature initially. I am not aware, please refresh my memory if you know different, of any core hotspot subsystem just being replaced in one fell swoop in one single release. Yes this needs a lot of testing but customers are not beta-testers. If this goes into a release on by default then there must be a way for customers to turn it off. UseHeavyMonitors is not a fallback as it is not for production use itself. So the new code has to co-exist along-side the old code as we make a transition across 2-3 releases. And yes that means a double-up on some testing as we already do for many things. Any fast locking scheme benefits the uncontended sync case. So if you have a lot of contention and therefore a lot of inflation, the fast locking won't show any benefit. What "modern workloads" are you using to measure this? We eventually got rid of biased-locking because it no longer showed any benefit, so it is possible that fast locking (of whichever form) could go the same way. And we may have moved past heavy use of synchronized in general for that matter, especially as Loom instigated many changes over to java.util.concurrent locks. Is UseHeavyMonitors in good enough shape to reliably be used for benchmark comparisons? I don't have github notification enabled so I missed this discussion. The JVMS permits lock A, lock B, unlock A, unlock B, in bytecode - i.e it passes verification and it does not violate the structured locking rules. It probably also passes verification if there is no exception table entries such that the unlocks are guaranteed to happen - regardless of the order. IIUC from above the VM will actually unlock all monitors for which there is a lock-record in the activation when the activation returns. The order in which it does that may be different to how the program would have done it but I don't see how that makes any difference to anything. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:06 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:06 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: <8MGsPdlBSWGR-pgF8_fLo_mez67z7nHWXg8UOcjJxIY=.38bd9c0f-3ba0-4ebe-867d-b54608f01e63@github.com> On Mon, 8 Aug 2022 12:14:38 GMT, David Holmes wrote: > The bar for acceptance for a brand new locking scheme with no fallback is extremely high and needs a lot of bake time and broad performance measurements, to watch for pathologies. That bar is lower if the scheme can be reverted to the old code if needed; and even lower still if the scheme is opt-in in the first place. For Java Object Monitors I made the new mechanism opt-in so the same could be done here. Granted it is not a trivial effort to do that, but I think a phased approach to transition to the new scheme is essential. It could be implemented as an experimental feature initially. Reverting a change should not be difficult. (Unless maybe another major change arrived in the meantime, which makes reverse-applying a patch non-trivial.) I'm skeptical to implement an opt-in runtime-switch, though. - Keeping the old paths side-by-side with the new paths is an engineering effort in itself, as you point out. It means that it, too, introduces significant risks to break locking, one way or the other (or both). - Making the new path opt-in means that we achieve almost nothing by it: testing code would still normally run the old paths (hopefully we didn't break it by making the change), and only use the new paths when explicitely told so, and I don't expect that many people voluntarily do that. It *may* be more useful to make it opt-out, as a quick fix if anybody experiences troubles with it. - Do we need runtime-switchable opt-in or opt-out flag for the initial testing and baking? I wouldn't think so: it seems better and cleaner to take the Git branch of this PR and put it through all relevant testing before the change goes in. - For how long do you think the runtime switch should stay? Because if it's all but temporary, it means we better test both paths thoroughly and automated. And it may also mean extra maintenance work (with extra avenues for bugs, see above), too. > I am not aware, please refresh my memory if you know different, of any core hotspot subsystem just being replaced in one fell swoop in one single release. Yes this needs a lot of testing but customers are not beta-testers. If this goes into a release on by default then there must be a way for customers to turn it off. UseHeavyMonitors is not a fallback as it is not for production use itself. So the new code has to co-exist along-side the old code as we make a transition across 2-3 releases. And yes that means a double-up on some testing as we already do for many things. I believe the least risky path overall is to make UseHeavyMonitors a production flag. Then it can act as a kill-switch for the new locking code, should anything go bad. I even considered to remove stack-locking altogether, and could only show minor performance impact, and always only in code that uses obsolete synchronized Java collections like Vector, Stack and StringBuffer. If you'd argue that it's too risky to use UseHeavyMonitors for that - then certainly you understand that the risk of introducing a new flag and manage two stack-locking subsystems would be even higher. There's a lot of code that is risky in itself to keep both paths. For example, I needed to change register allocation in the C2 .ad declarations and also in the interpreter/generated assembly code. It's hard enough to see that it is correct for one of the implementations, and much harder to implement and verify this correctly for two. > Any fast locking scheme benefits the uncontended sync case. So if you have a lot of contention and therefore a lot of inflation, the fast locking won't show any benefit. Not only that. As far as I can tell, 'heavy monitors' would only be worse off in workloads that 1. use uncontended sync and 2. churns monitors. Lots of uncontended sync on the same monitor object is not actually worse than fast-locking (it boils down to a single CAS in both cases). It only gets bad when code keeps allocating short-lived objects and syncs on them once or a few times only, and then moves on to the next new sync objects. > What "modern workloads" are you using to measure this? So far I tested with SPECjbb and SPECjvm-workloads-transplanted-into-JMH, dacapo and renaissance. I could only measure regressions with heavy monitors in workloads that use XML/XSLT, which I found out is because the XSTL compiler generates code that uses StringBuffer for (single-threaded) parsing. I also found a few other places in XML where usage of Stack and Vector has some impact. I can provide fixes for those, if needed (but I'm not sure whether this should go into JDK, upstream Xalan/Xerxes or both). > We eventually got rid of biased-locking because it no longer showed any benefit, so it is possible that fast locking (of whichever form) could go the same way. And we may have moved past heavy use of synchronized in general for that matter, especially as Loom instigated many changes over to java.util.concurrent locks. Yup. > Is UseHeavyMonitors in good enough shape to reliably be used for benchmark comparisons? Yes, except that the flag would have to be made product. Also, it is useful to use this PR instead of upstream JDK, because it simplifies the inflation protocol pretty much like it would be simplified without any stack-locking. I can make a standalone PR that gets rid of stack-locking altogether, if that is useful. Also keep in mind that both this fast-locking PR and total removal of stack-locking would enable some follow-up improvements: we'd no longer have to inflate monitors in order to install or read an i-hashcode. And GC code similarily may benefit from easier read/write of object age bits. This might benefit generational concurrent GC efforts. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From stuefe at openjdk.org Thu Oct 6 07:47:07 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 6 Oct 2022 07:47:07 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <8MGsPdlBSWGR-pgF8_fLo_mez67z7nHWXg8UOcjJxIY=.38bd9c0f-3ba0-4ebe-867d-b54608f01e63@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <8MGsPdlBSWGR-pgF8_fLo_mez67z7nHWXg8UOcjJxIY=.38bd9c0f-3ba0-4ebe-867d-b54608f01e63@github.com> Message-ID: On Mon, 8 Aug 2022 13:45:06 GMT, Roman Kennke wrote: > The bar for acceptance for a brand new locking scheme with no fallback is extremely high and needs a lot of bake time and broad performance measurements, to watch for pathologies. That bar is lower if the scheme can be reverted to the old code if needed; and even lower still if the scheme is opt-in in the first place. For Java Object Monitors I made the new mechanism opt-in so the same could be done here. Granted it is not a trivial effort to do that, but I think a phased approach to transition to the new scheme is essential. It could be implemented as an experimental feature initially. I fully agree that have to be careful, but I share Roman's viewpoint. If this work is something we want to happen and which is not in doubt in principle, then we also want the broadest possible test front. In my experience, opt-in coding is tested poorly. A runtime switch is fine as an emergency measure when you have customer problems, but then both standard and fallback code paths need to be very well tested. With something as ubiquitous as locking this would mean running almost the full test set with and without the new fast locking mechanism, and that is not feasible. Or even if it is, not practical: the cycles are better invested in hardening out the new locking mechanism. And arguably, we already have an opt-out mechanism in the form of UseHeavyMonitors. It's not ideal, but as Roman wrote, in most scenarios, this does not show any regression. So in a pinch, it could serve as a short-term solution if the new fast lock mechanism is broken. In my opinion, the best time for such an invasive change is the beginning of the development cycle for a non-LTS-release, like now. And we don't have to push the PR in a rush, we can cook it in its branch and review it very thoroughly. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:07 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:07 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) I ran some test locally, 4 JDI fails and 3 JVM TI, all seems to fail in: #7 0x00007f7cefc5c1ce in Thread::is_lock_owned (this=this at entry=0x7f7ce801dd90, adr=adr at entry=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/thread.cpp:549 #8 0x00007f7cef22c062 in JavaThread::is_lock_owned (this=0x7f7ce801dd90, adr=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/javaThread.cpp:979 #9 0x00007f7cefc79ab0 in Threads::owning_thread_from_monitor_owner (t_list=, owner=owner at entry=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/threads.cpp:1382 I didn't realize you still also is using the frame basic lock area. (in other projects this is removed and all cases are handled via the threads lock stack) So essentially we have two lock stacks when running in interpreter the frame area and the LockStack. That explains why I have not heard anything about popframe and friends :) ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:08 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:08 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Mon, 8 Aug 2022 15:44:50 GMT, Robbin Ehn wrote: > I ran some test locally, 4 JDI fails and 3 JVM TI, all seems to fail in: > > ``` > #7 0x00007f7cefc5c1ce in Thread::is_lock_owned (this=this at entry=0x7f7ce801dd90, adr=adr at entry=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/thread.cpp:549 > #8 0x00007f7cef22c062 in JavaThread::is_lock_owned (this=0x7f7ce801dd90, adr=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/javaThread.cpp:979 > #9 0x00007f7cefc79ab0 in Threads::owning_thread_from_monitor_owner (t_list=, owner=owner at entry=0x1 ) > at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/threads.cpp:1382 > ``` Thanks, Robbin! That was a bug in JvmtiBase::get_owning_thread() where an anonymous owner must be converted to the oop address before passing down to Threads::owning_thread_from_monitor_owner(). I pushed a fix. Can you re-test? Testing com/sun/jdi passes for me, now. > I didn't realize you still also is using the frame basic lock area. (in other projects this is removed and all cases are handled via the threads lock stack) So essentially we have two lock stacks when running in interpreter the frame area and the LockStack. > > That explains why I have not heard anything about popframe and friends :) Hmm yeah, I also realized this recently :-D I will have to clean this up before going further. And I'll also will work to support the unstructured locking in the interpreter. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:09 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:09 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Mon, 8 Aug 2022 18:29:54 GMT, Roman Kennke wrote: > > I ran some test locally, 4 JDI fails and 3 JVM TI, all seems to fail in: > > ``` > > #7 0x00007f7cefc5c1ce in Thread::is_lock_owned (this=this at entry=0x7f7ce801dd90, adr=adr at entry=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/thread.cpp:549 > > #8 0x00007f7cef22c062 in JavaThread::is_lock_owned (this=0x7f7ce801dd90, adr=0x1 ) at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/javaThread.cpp:979 > > #9 0x00007f7cefc79ab0 in Threads::owning_thread_from_monitor_owner (t_list=, owner=owner at entry=0x1 ) > > at /home/rehn/source/jdk/ongit/dev-jdk/open/src/hotspot/share/runtime/threads.cpp:1382 > > ``` > > Thanks, Robbin! That was a bug in JvmtiBase::get_owning_thread() where an anonymous owner must be converted to the oop address before passing down to Threads::owning_thread_from_monitor_owner(). I pushed a fix. Can you re-test? Testing com/sun/jdi passes for me, now. Yes, that fixed it. I'm running more tests also. I got this build problem on aarch64: open/src/hotspot/share/asm/assembler.hpp:168), pid=3387376, tid=3387431 # assert(is_bound() || is_unused()) failed: Label was never bound to a location, but it was used as a jmp target V [libjvm.so+0x4f4788] Label::~Label()+0x48 V [libjvm.so+0x424a44] cmpFastLockNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x764 V [libjvm.so+0x1643888] PhaseOutput::fill_buffer(CodeBuffer*, unsigned int*)+0x538 V [libjvm.so+0xa85fcc] Compile::Code_Gen()+0x3bc ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:10 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:10 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Tue, 9 Aug 2022 09:19:54 GMT, Robbin Ehn wrote: > I got this build problem on aarch64: Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:11 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:11 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Tue, 9 Aug 2022 10:46:51 GMT, Roman Kennke wrote: > Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). NP, thanks. I notice some other user of owning_thread_from_monitor_owner() such as DeadlockCycle::print_on_with() which asserts on "assert(adr != reinterpret_cast(1)) failed: must convert to lock object". ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:12 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:12 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <8MGsPdlBSWGR-pgF8_fLo_mez67z7nHWXg8UOcjJxIY=.38bd9c0f-3ba0-4ebe-867d-b54608f01e63@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <8MGsPdlBSWGR-pgF8_fLo_mez67z7nHWXg8UOcjJxIY=.38bd9c0f-3ba0-4ebe-867d-b54608f01e63@github.com> Message-ID: <9P1YMHdrh0hBVSsynwUQ5PVpU14yaF5V-00H5uWGLek=.fc7c13d9-3601-4f2e-8846-0b66eb0a13df@github.com> On Tue, 9 Aug 2022 09:32:47 GMT, Roman Kennke wrote: > I am not aware, please refresh my memory if you know different, of any core hotspot subsystem just being replaced in one fell swoop in one single release. Yes this needs a lot of testing but customers are not beta-testers. If this goes into a release on by default then there must be a way for customers to turn it off. UseHeavyMonitors is not a fallback as it is not for production use itself. So the new code has to co-exist along-side the old code as we make a transition across 2-3 releases. And yes that means a double-up on some testing as we already do for many things. Maybe it's worth to step back a little and discuss whether or not we actually want stack-locking (or a replacement) *at all*. My measurements seem to indicate that a majority of modern workloads (i.e. properly synchronized, not using legacy collections) actually benefit from running without stack-locking (or the fast-locking replacement). The workloads that suffer seem to be only such workloads which make heavy use of always-synchronized collections, code that we'd nowadays probably not consider 'idiomatic Java' anymore. This means that support for faster legacy code costs modern Java code actual performance points. Do we really want this? It may be wiser overall to simply drop stack-locking without replacement, and go and fix the identified locations where using of legacy collections affects performance negatively in the JDK (I found a few places in XML/XSLT code, for example). I am currently re-running my benchmarks to show this. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:13 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:13 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Tue, 9 Aug 2022 11:05:45 GMT, Robbin Ehn wrote: > > Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). > > NP, thanks. I notice some other user of owning_thread_from_monitor_owner() such as DeadlockCycle::print_on_with() which asserts on "assert(adr != reinterpret_cast(1)) failed: must convert to lock object". Do you know by any chance which tests trigger this? ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:13 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:13 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> On Thu, 11 Aug 2022 11:19:31 GMT, Roman Kennke wrote: > > > Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). > > > > > > NP, thanks. I notice some other user of owning_thread_from_monitor_owner() such as DeadlockCycle::print_on_with() which asserts on "assert(adr != reinterpret_cast(1)) failed: must convert to lock object". > > Do you know by any chance which tests trigger this? Yes, there is a couple of to choose from, I think the jstack cmd may be easiest: jstack/DeadlockDetectionTest.java ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:14 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:14 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) I added implementation for arm, ppc and s390 blindly. @shipilev, @tstuefe maybe you could sanity-check them? most likely they are buggy. I also haven't checked riscv at all, yet. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:15 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:15 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> Message-ID: On Thu, 11 Aug 2022 11:39:01 GMT, Robbin Ehn wrote: > > > > Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). > > > > > > > > > NP, thanks. I notice some other user of owning_thread_from_monitor_owner() such as DeadlockCycle::print_on_with() which asserts on "assert(adr != reinterpret_cast(1)) failed: must convert to lock object". > > > > > > Do you know by any chance which tests trigger this? > > Yes, there is a couple of to choose from, I think the jstack cmd may be easiest: jstack/DeadlockDetectionTest.java I pushed a refactoring and fixes to the relevant code, and all users should now work correctly. It's passing test tiers1-3 and tier4 is running while I write this. @robehn or @dholmes-ora I believe one of you mentioned somewhere (can't find the comment, though) that we might need to support the bytecode sequence monitorenter A; monitorenter B; monitorexit A; monitorexit B; properly. I have now made a testcase that checks this, and it does indeed fail with this PR, while passing with upstream. Also, the JVM spec doesn't mention anywhere that it is required that monitorenter/exit are properly nested. I'll have to fix this in the interpreter (JIT compilers refuse to compile not-properly-nested monitorenter/exit anyway). See https://github.com/rkennke/jdk/blob/fast-locking/test/hotspot/jtreg/runtime/locking/TestUnstructuredLocking.jasm ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:15 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:15 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> Message-ID: On Thu, 11 Aug 2022 11:39:01 GMT, Robbin Ehn wrote: >>> > Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). >>> >>> NP, thanks. I notice some other user of owning_thread_from_monitor_owner() such as DeadlockCycle::print_on_with() which asserts on "assert(adr != reinterpret_cast(1)) failed: must convert to lock object". >> >> Do you know by any chance which tests trigger this? > >> > > Thanks for giving this PR a spin. I pushed a fix for the aarch64 build problem (seems weird that GHA did not catch it). >> > >> > >> > NP, thanks. I notice some other user of owning_thread_from_monitor_owner() such as DeadlockCycle::print_on_with() which asserts on "assert(adr != reinterpret_cast(1)) failed: must convert to lock object". >> >> Do you know by any chance which tests trigger this? > > Yes, there is a couple of to choose from, I think the jstack cmd may be easiest: jstack/DeadlockDetectionTest.java > @robehn or @dholmes-ora I believe one of you mentioned somewhere (can't find the comment, though) that we might need to support the bytecode sequence monitorenter A; monitorenter B; monitorexit A; monitorexit B; properly. I have now made a testcase that checks this, and it does indeed fail with this PR, while passing with upstream. Also, the JVM spec doesn't mention anywhere that it is required that monitorenter/exit are properly nested. I'll have to fix this in the interpreter (JIT compilers refuse to compile not-properly-nested monitorenter/exit anyway). > > See https://github.com/rkennke/jdk/blob/fast-locking/test/hotspot/jtreg/runtime/locking/TestUnstructuredLocking.jasm jvms-2.11.10 > Structured locking is the situation when, during a method invocation, every exit on a given monitor matches a preceding entry on that monitor. Since there is no assurance that all code submitted to the Java Virtual Machine will perform structured locking, implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking. Let T be a thread and M be a monitor. Then: > > The number of monitor entries performed by T on M during a method invocation must equal the number of monitor exits performed by T on M during the method invocation whether the method invocation completes normally or abruptly. > > At no point during a method invocation may the number of monitor exits performed by T on M since the method invocation exceed the number of monitor entries performed by T on M since the method invocation. > > Note that the monitor entry and exit automatically performed by the Java Virtual Machine when invoking a synchronized method are considered to occur during the calling method's invocation. I think the intent of above was to allow enforcing structured locking. In relevant other projects, we support only structured locking in Java, but permit some unstructured locking when done via JNI. In that project JNI monitor enter/exit do not use the lockstack. I don't think we today fully support unstructured locking either: void foo_lock() { monitorenter(this); // If VM abruptly returns here 'this' will be unlocked // Because VM assumes structured locking. // see e.g. remove_activation(...) } *I scratch this as it was a bit off topic.* ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:16 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:16 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> Message-ID: On Tue, 16 Aug 2022 15:47:58 GMT, Robbin Ehn wrote: > > @robehn or @dholmes-ora I believe one of you mentioned somewhere (can't find the comment, though) that we might need to support the bytecode sequence monitorenter A; monitorenter B; monitorexit A; monitorexit B; properly. I have now made a testcase that checks this, and it does indeed fail with this PR, while passing with upstream. Also, the JVM spec doesn't mention anywhere that it is required that monitorenter/exit are properly nested. I'll have to fix this in the interpreter (JIT compilers refuse to compile not-properly-nested monitorenter/exit anyway). > > See https://github.com/rkennke/jdk/blob/fast-locking/test/hotspot/jtreg/runtime/locking/TestUnstructuredLocking.jasm > > jvms-2.11.10 > > > Structured locking is the situation when, during a method invocation, every exit on a given monitor matches a preceding entry on that monitor. Since there is no assurance that all code submitted to the Java Virtual Machine will perform structured locking, implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking. Let T be a thread and M be a monitor. Then: > > The number of monitor entries performed by T on M during a method invocation must equal the number of monitor exits performed by T on M during the method invocation whether the method invocation completes normally or abruptly. > > At no point during a method invocation may the number of monitor exits performed by T on M since the method invocation exceed the number of monitor entries performed by T on M since the method invocation. > > Note that the monitor entry and exit automatically performed by the Java Virtual Machine when invoking a synchronized method are considered to occur during the calling method's invocation. > > I think the intent of above was to allow enforcing structured locking. TBH, I don't see how this affects the scenario that I'm testing. The scenario: monitorenter A; monitorenter B; monitorexit A; monitorexit B; violates any of the two conditions: - the number of monitorenters and -exits during the execution always matches - the number of monitorexits for each monitor does not exceed the number of monitorenters for the same monitor Strictly speaking, I believe the conditions check for the (weaker) balanced property, but not for the (stronger) structured property. > In relevant other projects, we support only structured locking in Java, but permit some unstructured locking when done via JNI. In that project JNI monitor enter/exit do not use the lockstack. Yeah, JNI locking always inflate and uses full monitors. My proposal hasn't changed this. > I don't think we today fully support unstructured locking either: > > ``` > void foo_lock() { > monitorenter(this); > // If VM abruptly returns here 'this' will be unlocked > // Because VM assumes structured locking. > // see e.g. remove_activation(...) > } > ``` > > _I scratch this as it was a bit off topic._ Hmm yeah, this is required for properly handling exceptions. I have seen this making a bit of a mess in C1 code. That said, unstructured locking today only ever works in the interpreter, the JIT compilers would refuse to compile unstructured locking code. So if somebody would come up with a language and compiler that emits unstructured (e.g. hand-over-hand) locks, it would run, but only very slowly. I think I know how to make my proposal handle unstructured locking properly: In the interpreter monitorexit, I can check the top of the lock-stack, and if it doesn't match, call into the runtime, and there it's easy to implement the unstructured scenario. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:17 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:17 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> Message-ID: On Tue, 16 Aug 2022 16:21:04 GMT, Roman Kennke wrote: > Strictly speaking, I believe the conditions check for the (weaker) balanced property, but not for the (stronger) structured property. I know but the text says: - "every exit on a given monitor matches a preceding entry on that monitor." - "implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking" I read this as if the rules do not guarantee structured locking the rules are not correct. The VM is allowed to enforce it. But thats just my take on it. EDIT: Maybe I'm reading to much into it. Lock A,B then unlock A,B maybe is considered structured locking? But then again what if: void foo_lock() { monitorenter(A); monitorenter(B); // If VM abruptly returns here // VM can unlock them in reverse order first B and then A ? monitorexit(A); monitorexit(B); } ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:17 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:17 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> Message-ID: On Wed, 17 Aug 2022 07:29:23 GMT, Robbin Ehn wrote: > > Strictly speaking, I believe the conditions check for the (weaker) balanced property, but not for the (stronger) structured property. > > I know but the text says: > > * "every exit on a given monitor matches a preceding entry on that monitor." > > * "implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking" > > > I read this as if the rules do not guarantee structured locking the rules are not correct. The VM is allowed to enforce it. But thats just my take on it. > > EDIT: Maybe I'm reading to much into it. Lock A,B then unlock A,B maybe is considered structured locking? > > But then again what if: > > ``` > void foo_lock() { > monitorenter(A); > monitorenter(B); > // If VM abruptly returns here > // VM can unlock them in reverse order first B and then A ? > monitorexit(A); > monitorexit(B); > } > ``` Do you think there would be any chance to clarify the spec there? Or even outright disallow unstructured/not-properly-nested locking altogether (and maybe allow the verifier to check it)? That would certainly be the right thing to do. And, afaict, it would do no harm because no compiler of any language would ever emit unstructured locking anyway - because if it did, the resulting code would crawl interpreted-only). ------------- PR: https://git.openjdk.org/jdk/pull/9680 From kvn at openjdk.org Thu Oct 6 07:47:18 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 6 Oct 2022 07:47:18 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> <4eWUSTg-0XMN2ON3FYCM5uAUeIlarRNXgPJquBCXTQs=.5272e1a1-dcbb-4360-bb6d-9c0bc9d35313@github.com> Message-ID: On Wed, 17 Aug 2022 15:34:01 GMT, Roman Kennke wrote: >>> Strictly speaking, I believe the conditions check for the (weaker) balanced property, but not for the (stronger) structured property. >> >> I know but the text says: >> - "every exit on a given monitor matches a preceding entry on that monitor." >> - "implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking" >> >> I read this as if the rules do not guarantee structured locking the rules are not correct. >> The VM is allowed to enforce it. >> But thats just my take on it. >> >> EDIT: >> Maybe I'm reading to much into it. >> Lock A,B then unlock A,B maybe is considered structured locking? >> >> But then again what if: >> >> >> void foo_lock() { >> monitorenter(A); >> monitorenter(B); >> // If VM abruptly returns here >> // VM can unlock them in reverse order first B and then A ? >> monitorexit(A); >> monitorexit(B); >> } > >> > Strictly speaking, I believe the conditions check for the (weaker) balanced property, but not for the (stronger) structured property. >> >> I know but the text says: >> >> * "every exit on a given monitor matches a preceding entry on that monitor." >> >> * "implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking" >> >> >> I read this as if the rules do not guarantee structured locking the rules are not correct. The VM is allowed to enforce it. But thats just my take on it. >> >> EDIT: Maybe I'm reading to much into it. Lock A,B then unlock A,B maybe is considered structured locking? >> >> But then again what if: >> >> ``` >> void foo_lock() { >> monitorenter(A); >> monitorenter(B); >> // If VM abruptly returns here >> // VM can unlock them in reverse order first B and then A ? >> monitorexit(A); >> monitorexit(B); >> } >> ``` > > Do you think there would be any chance to clarify the spec there? Or even outright disallow unstructured/not-properly-nested locking altogether (and maybe allow the verifier to check it)? That would certainly be the right thing to do. And, afaict, it would do no harm because no compiler of any language would ever emit unstructured locking anyway - because if it did, the resulting code would crawl interpreted-only). We need to understand performance effects of these changes. I don't see data here or new JMH benchmarks which can show data. @rkennke can you show data you have? And, please, update RFE description with what you have in PR description. @ericcaspole do we have JMH benchmarks to test performance for different lock scenarios? I see few tests in `test/micro` which use `synchronized`. Are they enough? Or we need more? Do we have internal benchmarks we could use for such testing? I would prefer to have "opt-in" but looking on scope of changes it may introduce more issues. Without "opt-in" I want performance comparison of VMs with different implementation instead of using `UseHeavyMonitors` to make judgement about this implementation. `UseHeavyMonitors` (product flag) should be tested separately to make sure when it is used as fallback mechanism by customers they would not get significant performance penalty. I agree with @tstuefe that we should test this PR a lot (all tiers on all supported platforms) including performance testing before integration. In addition we need full testing of this implementation with `UseHeavyMonitors` ON. And I should repeat that integration happens when changes are ready (no issues). We should not rush for particular JDK release. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:19 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:19 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Tue, 30 Aug 2022 11:52:24 GMT, Roman Kennke wrote: >> I didn't realize you still also is using the frame basic lock area. (in other projects this is removed and all cases are handled via the threads lock stack) >> So essentially we have two lock stacks when running in interpreter the frame area and the LockStack. >> >> That explains why I have not heard anything about popframe and friends :) > >> I didn't realize you still also is using the frame basic lock area. (in other projects this is removed and all cases are handled via the threads lock stack) So essentially we have two lock stacks when running in interpreter the frame area and the LockStack. >> >> That explains why I have not heard anything about popframe and friends :) > > Hmm yeah, I also realized this recently :-D > I will have to clean this up before going further. And I'll also will work to support the unstructured locking in the interpreter. > We need to understand performance effects of these changes. I don't see data here or new JMH benchmarks which can show data. @rkennke can you show data you have? And, please, update RFE description with what you have in PR description. I did run macro benchmarks (SPECjvm, SPECjbb, renaissance, dacapo) and there performance is most often <1% from baseline, some better, some worse. However, I noticed that I made a mistake in my benchmark setup, and I have to re-run them again. So far it doesn't look like the results will be much different - only more reliable. Before I do proper re-runs, I first want to work on removing the interpreter lock-stack, and also to support 'weird' locking (see discussion above). I don't expect those to affect performance very much, because it will only change the interpreter paths. I haven't run any microbenchmarks, yet, but it may be useful. If you have any, please point me in the direction. > I would prefer to have "opt-in" but looking on scope of changes it may introduce more issues. Without "opt-in" I want performance comparison of VMs with different implementation instead of using `UseHeavyMonitors` to make judgement about this implementation. `UseHeavyMonitors` (product flag) should be tested separately to make sure when it is used as fallback mechanism by customers they would not get significant performance penalty. Yes, I can do that. > I agree with @tstuefe that we should test this PR a lot (all tiers on all supported platforms) including performance testing before integration. In addition we need full testing of this implementation with `UseHeavyMonitors` ON. Ok. I'd also suggest to run relevant (i.e. what relates to synchronized) jcstress tests. > And I should repeat that integration happens when changes are ready (no issues). We should not rush for particular JDK release. Sure, I am not planning on rushing this. ;-) > I didn't realize you still also is using the frame basic lock area. (in other projects this is removed and all cases are handled via the threads lock stack) So essentially we have two lock stacks when running in interpreter the frame area and the LockStack. > > That explains why I have not heard anything about popframe and friends :) I'm now wondering if what I kinda accidentally did there is not the sane thing to do. The 'real' lock-stack (the one that I added) holds all the (fast-)locked oops. The frame basic lock area also holds oops now (before it was oop-lock pairs), and in addition to the per-thread lock-stack it also holds the association frame->locks, which is useful when popping interpreter frames, so that we can exit all active locks easily. C1 and C2 don't need this, because 1. the monitor enter and exit there is always symmetric and 2. they have their own and more efficient ways to remove activations. How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:20 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:20 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Fri, 9 Sep 2022 19:01:14 GMT, Roman Kennke wrote: > How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? At the moment I had to store a "frame id" for each entry in the lock stack. The frame id is previous fp, grabbed from "link()" when entering the locking code. private static final void monitorEnter(Object o) { .... long monitorFrameId = getCallerFrameId(); ``` When popping we can thus check if there is still monitors/locks for the frame to be popped. Remove activation reads the lock stack, with a bunch of assembly, e.g.: ` access_load_at(T_INT, IN_HEAP, rax, Address(rax, java_lang_Thread::lock_stack_pos_offset()), noreg, noreg); ` If we would keep this, loom freezing would need to relativize and derelativize these values. (we only have interpreter) But, according to JVMS 2.11.10. the VM only needs to automatically unlock synchronized method. This code that unlocks all locks in the frame seems to have been added for JLS 17.1. I have asked for clarification and we only need and should care about JVMS. So if we could make popframe do more work (popframe needs to unlock all), there seems to be way forward allowing more flexibility. Still working on trying to make what we have public, even if it's in roughly shape and it's very unclear if that is the correct approach at all. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 07:47:21 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 07:47:21 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Mon, 12 Sep 2022 06:37:19 GMT, Robbin Ehn wrote: > > How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? > > At the moment I had to store a "frame id" for each entry in the lock stack. The frame id is previous fp, grabbed from "link()" when entering the locking code. > > ``` > private static final void monitorEnter(Object o) { > .... > long monitorFrameId = getCallerFrameId(); > ``` > > When popping we can thus check if there is still monitors/locks for the frame to be popped. Remove activation reads the lock stack, with a bunch of assembly, e.g.: ` access_load_at(T_INT, IN_HEAP, rax, Address(rax, java_lang_Thread::lock_stack_pos_offset()), noreg, noreg);` If we would keep this, loom freezing would need to relativize and derelativize these values. (we only have interpreter) Hmm ok. I was thinking something similar, but instead of storing pairs (oop/frame-id), push frame-markers on the lock-stack. But given that we only need all this for the interpreter, I am wondering if keeping what we have now (e.g. the per-frame-lock-stack in interpreter frame) is the saner thing to do. The overhead seems very small, perhaps very similar to keeping track of frames in the per-thread lock-stack. > But, according to JVMS 2.11.10. the VM only needs to automatically unlock synchronized method. This code that unlocks all locks in the frame seems to have been added for JLS 17.1. I have asked for clarification and we only need and should care about JVMS. > > So if we could make popframe do more work (popframe needs to unlock all), there seems to be way forward allowing more flexibility. > Still working on trying to make what we have public, even if it's in roughly shape and it's very unclear if that is the correct approach at all. Nice! >From your snippets above I am gleaning that your implementation has the actual lock-stack in Java. Is that correct? Is there a particular reason why you need this? Is this for Loom? Would the implementation that I am proposing here also work for your use-case(s)? Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 07:47:22 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 07:47:22 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Mon, 12 Sep 2022 07:54:48 GMT, Roman Kennke wrote: > Nice! From your snippets above I am gleaning that your implementation has the actual lock-stack in Java. Is that correct? Is there a particular reason why you need this? Is this for Loom? Would the implementation that I am proposing here also work for your use-case(s)? > Yes, the entire implementation is in Java. void push(Object lockee, long fid) { if (this != Thread.currentThread()) Monitor.abort("invariant"); if (lockStackPos == lockStack.length) { grow(); } frameId[lockStackPos] = fid; lockStack[lockStackPos++] = lockee; } We are starting from the point of let's do everything be in Java. I want smart people to being able to change the implementation. So I really don't like the hardcoded assembly in remove_activation which do this check on frame id on the lock stack. If we can make the changes to e.g. popframe and take a bit different approach to JVMS we may have a total flexible Java implementation. But a flexible Java implementation means compiler can't have intrinsics, so what will the performance be.... We have more loose-ends than we can handle at the moment. Your code may be useable for JOM if we lock the implementation to using a lock-stack and we are going to write intrinsics to it. There is no point of it being in Java if so IMHO. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rehn at openjdk.org Thu Oct 6 08:13:09 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 6 Oct 2022 08:13:09 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Mon, 12 Sep 2022 07:54:48 GMT, Roman Kennke wrote: >>> How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? >> >> At the moment I had to store a "frame id" for each entry in the lock stack. >> The frame id is previous fp, grabbed from "link()" when entering the locking code. >> >> private static final void monitorEnter(Object o) { >> .... >> long monitorFrameId = getCallerFrameId(); >> ``` >> When popping we can thus check if there is still monitors/locks for the frame to be popped. >> Remove activation reads the lock stack, with a bunch of assembly, e.g.: >> ` access_load_at(T_INT, IN_HEAP, rax, Address(rax, java_lang_Thread::lock_stack_pos_offset()), noreg, noreg); >> ` >> If we would keep this, loom freezing would need to relativize and derelativize these values. >> (we only have interpreter) >> >> But, according to JVMS 2.11.10. the VM only needs to automatically unlock synchronized method. >> This code that unlocks all locks in the frame seems to have been added for JLS 17.1. >> I have asked for clarification and we only need and should care about JVMS. >> >> So if we could make popframe do more work (popframe needs to unlock all), there seems to be way forward allowing more flexibility. >> >> Still working on trying to make what we have public, even if it's in roughly shape and it's very unclear if that is the correct approach at all. > >> > How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? >> >> At the moment I had to store a "frame id" for each entry in the lock stack. The frame id is previous fp, grabbed from "link()" when entering the locking code. >> >> ``` >> private static final void monitorEnter(Object o) { >> .... >> long monitorFrameId = getCallerFrameId(); >> ``` >> >> When popping we can thus check if there is still monitors/locks for the frame to be popped. Remove activation reads the lock stack, with a bunch of assembly, e.g.: ` access_load_at(T_INT, IN_HEAP, rax, Address(rax, java_lang_Thread::lock_stack_pos_offset()), noreg, noreg);` If we would keep this, loom freezing would need to relativize and derelativize these values. (we only have interpreter) > > Hmm ok. I was thinking something similar, but instead of storing pairs (oop/frame-id), push frame-markers on the lock-stack. > > But given that we only need all this for the interpreter, I am wondering if keeping what we have now (e.g. the per-frame-lock-stack in interpreter frame) is the saner thing to do. The overhead seems very small, perhaps very similar to keeping track of frames in the per-thread lock-stack. > >> But, according to JVMS 2.11.10. the VM only needs to automatically unlock synchronized method. This code that unlocks all locks in the frame seems to have been added for JLS 17.1. I have asked for clarification and we only need and should care about JVMS. >> >> So if we could make popframe do more work (popframe needs to unlock all), there seems to be way forward allowing more flexibility. > >> Still working on trying to make what we have public, even if it's in roughly shape and it's very unclear if that is the correct approach at all. > > Nice! > From your snippets above I am gleaning that your implementation has the actual lock-stack in Java. Is that correct? Is there a particular reason why you need this? Is this for Loom? Would the implementation that I am proposing here also work for your use-case(s)? > > Thanks, > Roman @rkennke I will have a look, but may I suggest to open a new PR and just reference this as background discussion? I think most of the comments above is not relevant enough for a new reviewer to struggle through. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/9680 From dean.long at oracle.com Thu Oct 6 08:16:49 2022 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 6 Oct 2022 01:16:49 -0700 Subject: Q: Should we use 64bit atomic in x86_64 patch_verified_entry code? In-Reply-To: References: Message-ID: <0c5290a6-577d-2bf0-3b45-fe05fae402d1@oracle.com> Hi Dmitry, On 10/5/22 5:55 AM, Dmitry Samersoff wrote: > Hello Everybody, > > I'm working on a crash that seems to be related to CMC[1] - the JVM > crashes when a method become not re-entrant because a JavaThread > executing a compiled method reaches an instruction partially-assembled > during patching of verified entry point. > > In the void NativeJump::patch_verified_entry() > > we atomically patch first 4 bytes, then atomically patch 5th byte, > then atomically patch first 4 bytes again. > > Is it better (from CMC point of view) to patch atomically 8 bytes > at once? > Yes, I believe so.? I was looking into exactly that recently, but I don't have a reliable reproducer to demonstrate the problem.? If you have a reproducer, please share it, along with what hardware you can reproduce it on, or go ahead and file a bug on bugs.openjdk.org with the details. thanks, dl > 1. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html > > -Dmitry > From rkennke at openjdk.org Thu Oct 6 09:39:31 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 09:39:31 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Mon, 12 Sep 2022 07:54:48 GMT, Roman Kennke wrote: >>> How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? >> >> At the moment I had to store a "frame id" for each entry in the lock stack. >> The frame id is previous fp, grabbed from "link()" when entering the locking code. >> >> private static final void monitorEnter(Object o) { >> .... >> long monitorFrameId = getCallerFrameId(); >> ``` >> When popping we can thus check if there is still monitors/locks for the frame to be popped. >> Remove activation reads the lock stack, with a bunch of assembly, e.g.: >> ` access_load_at(T_INT, IN_HEAP, rax, Address(rax, java_lang_Thread::lock_stack_pos_offset()), noreg, noreg); >> ` >> If we would keep this, loom freezing would need to relativize and derelativize these values. >> (we only have interpreter) >> >> But, according to JVMS 2.11.10. the VM only needs to automatically unlock synchronized method. >> This code that unlocks all locks in the frame seems to have been added for JLS 17.1. >> I have asked for clarification and we only need and should care about JVMS. >> >> So if we could make popframe do more work (popframe needs to unlock all), there seems to be way forward allowing more flexibility. >> >> Still working on trying to make what we have public, even if it's in roughly shape and it's very unclear if that is the correct approach at all. > >> > How have you handled the interpreter lock-stack-area in your implementation? Is it worth to get rid of it and consolidate with the per-thread lock-stack? >> >> At the moment I had to store a "frame id" for each entry in the lock stack. The frame id is previous fp, grabbed from "link()" when entering the locking code. >> >> ``` >> private static final void monitorEnter(Object o) { >> .... >> long monitorFrameId = getCallerFrameId(); >> ``` >> >> When popping we can thus check if there is still monitors/locks for the frame to be popped. Remove activation reads the lock stack, with a bunch of assembly, e.g.: ` access_load_at(T_INT, IN_HEAP, rax, Address(rax, java_lang_Thread::lock_stack_pos_offset()), noreg, noreg);` If we would keep this, loom freezing would need to relativize and derelativize these values. (we only have interpreter) > > Hmm ok. I was thinking something similar, but instead of storing pairs (oop/frame-id), push frame-markers on the lock-stack. > > But given that we only need all this for the interpreter, I am wondering if keeping what we have now (e.g. the per-frame-lock-stack in interpreter frame) is the saner thing to do. The overhead seems very small, perhaps very similar to keeping track of frames in the per-thread lock-stack. > >> But, according to JVMS 2.11.10. the VM only needs to automatically unlock synchronized method. This code that unlocks all locks in the frame seems to have been added for JLS 17.1. I have asked for clarification and we only need and should care about JVMS. >> >> So if we could make popframe do more work (popframe needs to unlock all), there seems to be way forward allowing more flexibility. > >> Still working on trying to make what we have public, even if it's in roughly shape and it's very unclear if that is the correct approach at all. > > Nice! > From your snippets above I am gleaning that your implementation has the actual lock-stack in Java. Is that correct? Is there a particular reason why you need this? Is this for Loom? Would the implementation that I am proposing here also work for your use-case(s)? > > Thanks, > Roman > @rkennke I will have a look, but may I suggest to open a new PR and just reference this as background discussion? I think most of the comments above is not relevant enough for a new reviewer to struggle through. What do you think? Ok, will do that. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 10:22:14 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 10:22:14 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) Closing this PR in favour of a new, clean PR. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 10:22:14 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 10:22:14 GMT Subject: Withdrawn: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9680 From rkennke at openjdk.org Thu Oct 6 10:30:19 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 6 Oct 2022 10:30:19 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking Message-ID: This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. This change enables to simplify (and speed-up!) a lot of code: - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR ### Benchmarks All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. #### DaCapo/AArch64 Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? benchmark | baseline | fast-locking | % | size -- | -- | -- | -- | -- avrora | 27859 | 27563 | 1.07% | large batik | 20786 | 20847 | -0.29% | large biojava | 27421 | 27334 | 0.32% | default eclipse | 59918 | 60522 | -1.00% | large fop | 3670 | 3678 | -0.22% | default graphchi | 2088 | 2060 | 1.36% | default h2 | 297391 | 291292 | 2.09% | huge jme | 8762 | 8877 | -1.30% | default jython | 18938 | 18878 | 0.32% | default luindex | 1339 | 1325 | 1.06% | default lusearch | 918 | 936 | -1.92% | default pmd | 58291 | 58423 | -0.23% | large sunflow | 32617 | 24961 | 30.67% | large tomcat | 25481 | 25992 | -1.97% | large tradebeans | 314640 | 311706 | 0.94% | huge tradesoap | 107473 | 110246 | -2.52% | huge xalan | 6047 | 5882 | 2.81% | default zxing | 970 | 926 | 4.75% | default #### DaCapo/x86_64 The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. benchmark | baseline | fast-Locking | % | size -- | -- | -- | -- | -- avrora | 127690 | 126749 | 0.74% | large batik | 12736 | 12641 | 0.75% | large biojava | 15423 | 15404 | 0.12% | default eclipse | 41174 | 41498 | -0.78% | large fop | 2184 | 2172 | 0.55% | default graphchi | 1579 | 1560 | 1.22% | default h2 | 227614 | 230040 | -1.05% | huge jme | 8591 | 8398 | 2.30% | default jython | 13473 | 13356 | 0.88% | default luindex | 824 | 813 | 1.35% | default lusearch | 962 | 968 | -0.62% | default pmd | 40827 | 39654 | 2.96% | large sunflow | 53362 | 43475 | 22.74% | large tomcat | 27549 | 28029 | -1.71% | large tradebeans | 190757 | 190994 | -0.12% | huge tradesoap | 68099 | 67934 | 0.24% | huge xalan | 7969 | 8178 | -2.56% | default zxing | 1176 | 1148 | 2.44% | default #### Renaissance/AArch64 This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. benchmark | baseline | fast-locking | % -- | -- | -- | -- AkkaUct | 2558.832 | 2513.594 | 1.80% Reactors | 14715.626 | 14311.246 | 2.83% Als | 1851.485 | 1869.622 | -0.97% ChiSquare | 1007.788 | 1003.165 | 0.46% GaussMix | 1157.491 | 1149.969 | 0.65% LogRegression | 717.772 | 733.576 | -2.15% MovieLens | 7916.181 | 8002.226 | -1.08% NaiveBayes | 395.296 | 386.611 | 2.25% PageRank | 4294.939 | 4346.333 | -1.18% FjKmeans | 519.2 | 498.357 | 4.18% FutureGenetic | 2578.504 | 2589.255 | -0.42% Mnemonics | 4898.886 | 4903.689 | -0.10% ParMnemonics | 4260.507 | 4210.121 | 1.20% Scrabble | 139.37 | 138.312 | 0.76% RxScrabble | 320.114 | 322.651 | -0.79% Dotty | 1056.543 | 1068.492 | -1.12% ScalaDoku | 3443.117 | 3449.477 | -0.18% Philosophers | 24333.311 | 23438.22 | 3.82% ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% FinagleChirper | 6814.192 | 6853.38 | -0.57% FinagleHttp | 4762.902 | 4807.564 | -0.93% #### Renaissance/x86_64 benchmark | baseline | fast-locking | % -- | -- | -- | -- AkkaUct | 1117.185 | 1116.425 | 0.07% Reactors | 11561.354 | 11812.499 | -2.13% Als | 1580.838 | 1575.318 | 0.35% ChiSquare | 459.601 | 467.109 | -1.61% GaussMix | 705.944 | 685.595 | 2.97% LogRegression | 659.944 | 656.428 | 0.54% MovieLens | 7434.303 | 7592.271 | -2.08% NaiveBayes | 413.482 | 417.369 | -0.93% PageRank | 3259.233 | 3276.589 | -0.53% FjKmeans | 946.429 | 938.991 | 0.79% FutureGenetic | 1760.672 | 1815.272 | -3.01% Scrabble | 147.996 | 150.084 | -1.39% RxScrabble | 177.755 | 177.956 | -0.11% Dotty | 673.754 | 683.919 | -1.49% ScalaDoku | 2193.562 | 1958.419 | 12.01% ScalaKmeans | 165.376 | 168.925 | -2.10% ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. ### Testing - [x] tier1 (x86_64, aarch64, x86_32) - [x] tier2 (x86_64, aarch64) - [x] tier3 (x86_64, aarch64) - [x] tier4 (x86_64, aarch64) ------------- Commit messages: - Merge tag 'jdk-20+17' into fast-locking - Fix OSR packing in AArch64, part 2 - Fix OSR packing in AArch64 - Merge remote-tracking branch 'upstream/master' into fast-locking - Fix register in interpreter unlock x86_32 - Support unstructured locking in interpreter (x86 parts) - Support unstructured locking in interpreter (aarch64 and shared parts) - Merge branch 'master' into fast-locking - Merge branch 'master' into fast-locking - Added test for hand-over-hand locking - ... and 17 more: https://git.openjdk.org/jdk/compare/79ccc791...3ed51053 Changes: https://git.openjdk.org/jdk/pull/10590/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291555 Stats: 3660 lines in 127 files changed: 650 ins; 2481 del; 529 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From stefank at openjdk.org Thu Oct 6 11:44:15 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Oct 2022 11:44:15 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing Message-ID: When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. I'd like to change the single-generation ZGC to do the same. ------------- Commit messages: - 8294238: ZGC: Move CLD claimed mark clearing Changes: https://git.openjdk.org/jdk/pull/10591/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10591&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294238 Stats: 28 lines in 7 files changed: 26 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10591.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10591/head:pull/10591 PR: https://git.openjdk.org/jdk/pull/10591 From coleenp at openjdk.org Thu Oct 6 12:02:18 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Oct 2022 12:02:18 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 11:20:40 GMT, Stefan Karlsson wrote: > When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. > > ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. > > In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. > > I'd like to change the single-generation ZGC to do the same. CLD bits ok with me. The ClassLoaderData::verify_not_claimed should probably be in an #ifdef ASSERT conditional. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10591 From stefank at openjdk.org Thu Oct 6 12:28:35 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Oct 2022 12:28:35 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 11:20:40 GMT, Stefan Karlsson wrote: > When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. > > ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. > > In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. > > I'd like to change the single-generation ZGC to do the same. Thanks, Coleen. I've updated the patch with your suggestion. ------------- PR: https://git.openjdk.org/jdk/pull/10591 From stefank at openjdk.org Thu Oct 6 12:28:35 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Oct 2022 12:28:35 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v2] In-Reply-To: References: Message-ID: > When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. > > ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. > > In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. > > I'd like to change the single-generation ZGC to do the same. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Guard verify_not_claimed with ifdef ASSERT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10591/files - new: https://git.openjdk.org/jdk/pull/10591/files/fd2187ac..d7fdc8e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10591&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10591&range=00-01 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10591.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10591/head:pull/10591 PR: https://git.openjdk.org/jdk/pull/10591 From jsjolen at openjdk.org Thu Oct 6 12:32:24 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Oct 2022 12:32:24 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v6] In-Reply-To: References: Message-ID: <_u6caJypBYTjD1Dqf2CVdgrFNa4N_L-ltXtwWknHOiI=.9a128aef-e365-424e-bd50-b3124e8bd3c7@github.com> > Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. > > This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. > > Thank you for considering it. Johan Sj?len has updated the pull request incrementally with five additional commits since the last revision: - Messed up the merge but now it compiles - Merge branch 'dyn-cheapobj-take2' into dyn-cheapobj - Fix style issues - Final refactoring? - Final refactoring? ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10412/files - new: https://git.openjdk.org/jdk/pull/10412/files/7076784c..9a365985 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=04-05 Stats: 80 lines in 2 files changed: 11 ins; 41 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/10412.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10412/head:pull/10412 PR: https://git.openjdk.org/jdk/pull/10412 From jsjolen at openjdk.org Thu Oct 6 12:44:23 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Oct 2022 12:44:23 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v7] In-Reply-To: References: Message-ID: > Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. > > This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. > > Thank you for considering it. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix include order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10412/files - new: https://git.openjdk.org/jdk/pull/10412/files/9a365985..f8302de2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10412&range=05-06 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10412.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10412/head:pull/10412 PR: https://git.openjdk.org/jdk/pull/10412 From jwaters at openjdk.org Thu Oct 6 12:45:08 2022 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 6 Oct 2022 12:45:08 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v14] In-Reply-To: References: Message-ID: > Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. > > See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'openjdk:master' into patch-4 - Merge branch 'openjdk:master' into patch-4 - Use - instead of : as a separator - Merge branch 'openjdk:master' into patch-4 - Make DLL_ERROR4 look a little better without changing what it means - Revert changes to JLI_ReportErrorMessageSys - Update java_md.c - Update java_md.h - Merge branch 'openjdk:master' into patch-4 - Merge branch 'openjdk:master' into patch-4 - ... and 6 more: https://git.openjdk.org/jdk/compare/3b1b25a0...c3113cac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9749/files - new: https://git.openjdk.org/jdk/pull/9749/files/aadf6275..c3113cac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9749&range=12-13 Stats: 773 lines in 72 files changed: 327 ins; 144 del; 302 mod Patch: https://git.openjdk.org/jdk/pull/9749.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9749/head:pull/9749 PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Thu Oct 6 12:45:13 2022 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 6 Oct 2022 12:45:13 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v13] In-Reply-To: <8nmAhTC5tubNWCLn89kWO4hQaP9ILZvgkx1ZtqMS9yY=.c794b8f4-0e50-472b-9c9a-993a2b24d8d2@github.com> References: <8nmAhTC5tubNWCLn89kWO4hQaP9ILZvgkx1ZtqMS9yY=.c794b8f4-0e50-472b-9c9a-993a2b24d8d2@github.com> Message-ID: On Wed, 5 Oct 2022 16:40:16 GMT, Julian Waters wrote: >> Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. >> >> See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-4 > - Use - instead of : as a separator > - Merge branch 'openjdk:master' into patch-4 > - Make DLL_ERROR4 look a little better without changing what it means > - Revert changes to JLI_ReportErrorMessageSys > - Update java_md.c > - Update java_md.h > - Merge branch 'openjdk:master' into patch-4 > - Merge branch 'openjdk:master' into patch-4 > - Back out change to DLL_ERROR4 for separate RFE > - ... and 5 more: https://git.openjdk.org/jdk/compare/2923d2f4...aadf6275 Successful merge with latest, change should now be ready for review ------------- PR: https://git.openjdk.org/jdk/pull/9749 From duke at openjdk.org Thu Oct 6 13:08:32 2022 From: duke at openjdk.org (JervenBolleman) Date: Thu, 6 Oct 2022 13:08:32 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> References: <_l7L4QKD3xsKDBmYODw-ZByLKdKlymyNNMZU49ABkBg=.ce136e4e-24ba-434b-ba53-4f53a44ef915@github.com> Message-ID: On Thu, 28 Jul 2022 19:58:34 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 519.2 | 498.357 | 4.18% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) For those following along the new PR is https://github.com/openjdk/jdk/pull/10590 ------------- PR: https://git.openjdk.org/jdk/pull/9680 From ayang at openjdk.org Thu Oct 6 13:43:23 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 6 Oct 2022 13:43:23 GMT Subject: RFR: 8294907: Remove unused NativeLookup::dll_load Message-ID: Trivial change of removing dead code. ------------- Commit messages: - nativelookup-remove Changes: https://git.openjdk.org/jdk/pull/10595/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10595&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294907 Stats: 19 lines in 2 files changed: 0 ins; 19 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10595.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10595/head:pull/10595 PR: https://git.openjdk.org/jdk/pull/10595 From stefank at openjdk.org Thu Oct 6 13:44:21 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Oct 2022 13:44:21 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v7] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 12:44:23 GMT, Johan Sj?len wrote: >> Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. >> >> This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. >> >> Thank you for considering it. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix include order Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10412 From coleenp at openjdk.org Thu Oct 6 13:51:22 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 6 Oct 2022 13:51:22 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v7] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 12:44:23 GMT, Johan Sj?len wrote: >> Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. >> >> This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. >> >> Thank you for considering it. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix include order Looks great. Thanks for these changes. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10412 From lmesnik at openjdk.org Thu Oct 6 14:51:29 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 6 Oct 2022 14:51:29 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 22:49:20 GMT, Serguei Spitsyn wrote: > The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 > > A few tests are impacted by this fix: > > test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 > > > The following test has been removed as non-relevant any more: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` > > New negative test has been added instead: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` > > All JVM TI and JPDA tests were used locally for verification. > They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. > > Mach5 test runs on all platforms are TBD. test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java line 29: > 27: * @requires vm.continuations > 28: * @library /test/lib > 29: * @compile --enable-preview -source ${jdk.version} GetSetLocalUnsuspended.java You could use * @enablePreview instead of --enable-preview ------------- PR: https://git.openjdk.org/jdk/pull/10586 From sspitsyn at openjdk.org Thu Oct 6 17:25:09 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Oct 2022 17:25:09 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 00:50:50 GMT, Leonid Mesnik wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java line 29: > >> 27: * @requires vm.continuations >> 28: * @library /test/lib >> 29: * @compile --enable-preview -source ${jdk.version} GetSetLocalUnsuspended.java > > You could use * @enablePreview instead of --enable-preview Good suggestion. It seems to be working. But I had to remove the @compile line #29. I hope, it is okay. ------------- PR: https://git.openjdk.org/jdk/pull/10586 From sspitsyn at openjdk.org Thu Oct 6 17:31:00 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Oct 2022 17:31:00 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: > The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 > > A few tests are impacted by this fix: > > test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 > > > The following test has been removed as non-relevant any more: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` > > New negative test has been added instead: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` > > All JVM TI and JPDA tests were used locally for verification. > They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. > > Mach5 test runs on all platforms are TBD. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed review comments about is_JavaThread_current and @enablePreview tag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10586/files - new: https://git.openjdk.org/jdk/pull/10586/files/b2c341f1..5991659f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10586&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10586&range=00-01 Stats: 16 lines in 2 files changed: 9 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10586.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10586/head:pull/10586 PR: https://git.openjdk.org/jdk/pull/10586 From sspitsyn at openjdk.org Thu Oct 6 17:31:03 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Oct 2022 17:31:03 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: <7KcCEr8Byi3GKWeNrTJJedUNsjS_8e0V2-rymIVxVrM=.9025b65b-980c-44d3-8e4d-9e78c9d5d513@github.com> References: <7KcCEr8Byi3GKWeNrTJJedUNsjS_8e0V2-rymIVxVrM=.9025b65b-980c-44d3-8e4d-9e78c9d5d513@github.com> Message-ID: <0DUVv5cx4seGdyK6MVK2T4eoVAGoZXMWrgnYMOBlSfc=.8df46a96-8a30-4987-88c4-3bf5df03fd1d@github.com> On Thu, 6 Oct 2022 07:16:44 GMT, Dmitry Samersoff wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed review comments about is_JavaThread_current and @enablePreview tag > > src/hotspot/share/prims/jvmtiEnvBase.hpp line 180: > >> 178: JavaThread* current = JavaThread::current(); >> 179: oop cur_obj = current->jvmti_vthread(); >> 180: bool is_current = jt == current && (cur_obj == NULL || cur_obj == thr_obj); > > It might be better to restructure this "if" and check for jt==current before we ask for cur_obj, or at least add brackets. Thank you for the comment. I've refactored it a little bit. Please, let me know if you agree with it. ------------- PR: https://git.openjdk.org/jdk/pull/10586 From amenkov at openjdk.org Thu Oct 6 17:44:22 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 6 Oct 2022 17:44:22 GMT Subject: RFR: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem [v7] In-Reply-To: References: Message-ID: <5e7hJ80RXaHgAazmGH3VnvwYTtYNKLqym4oKz457uKY=.dcd02e54-4be7-4717-8c55-e30b38e6ede0@github.com> > The problem is RedefineClasses does not update cached_class_bytes, so subsequent RetransformClasses gets obsolete class bytes (this are testcases 3-6 from the new test) > > cached_class_bytes are set when an agent instruments the class from ClassFileLoadHook. > After successful RedefineClasses it should be reset. > The fix updates ClassFileLoadHook caller to not use old cached_class_bytes with RedefineClasses (if some agent instruments the class, new cached_class_bytes are allocated for scratch_class) and updates cached_class_bytes after successful RedefineClasses or RetransformClasses. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Fixed compilation error on linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10032/files - new: https://git.openjdk.org/jdk/pull/10032/files/d853f23f..36c4ba60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10032&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10032&range=05-06 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10032.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10032/head:pull/10032 PR: https://git.openjdk.org/jdk/pull/10032 From lmesnik at openjdk.org Thu Oct 6 23:28:25 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 6 Oct 2022 23:28:25 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 17:31:00 GMT, Serguei Spitsyn wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments about is_JavaThread_current and @enablePreview tag Marked as reviewed by lmesnik (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10586 From kbarrett at openjdk.org Fri Oct 7 00:31:24 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 7 Oct 2022 00:31:24 GMT Subject: RFR: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj [v7] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 12:44:23 GMT, Johan Sj?len wrote: >> Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. >> >> This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. >> >> Thank you for considering it. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix include order Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/10412 From dholmes at openjdk.org Fri Oct 7 01:02:30 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Oct 2022 01:02:30 GMT Subject: RFR: 8294907: Remove unused NativeLookup::dll_load In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 13:29:49 GMT, Albert Mingkun Yang wrote: > Trivial change of removing dead code. Looks good and trivial. Thanks for spotting and doing this cleanup. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10595 From mbaesken at openjdk.org Fri Oct 7 07:37:21 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 7 Oct 2022 07:37:21 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding Message-ID: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. ------------- Commit messages: - JDK-8294901 Changes: https://git.openjdk.org/jdk/pull/10600/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10600&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294901 Stats: 38 lines in 4 files changed: 0 ins; 37 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10600.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10600/head:pull/10600 PR: https://git.openjdk.org/jdk/pull/10600 From dmitry.samersoff at bell-sw.com Fri Oct 7 08:33:49 2022 From: dmitry.samersoff at bell-sw.com (Dmitry Samersoff) Date: Fri, 7 Oct 2022 11:33:49 +0300 Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: <0DUVv5cx4seGdyK6MVK2T4eoVAGoZXMWrgnYMOBlSfc=.8df46a96-8a30-4987-88c4-3bf5df03fd1d@github.com> References: <7KcCEr8Byi3GKWeNrTJJedUNsjS_8e0V2-rymIVxVrM=.9025b65b-980c-44d3-8e4d-9e78c9d5d513@github.com> <0DUVv5cx4seGdyK6MVK2T4eoVAGoZXMWrgnYMOBlSfc=.8df46a96-8a30-4987-88c4-3bf5df03fd1d@github.com> Message-ID: Hi Serguei, Looks good for me. Thank you! -Dmitry On 06/10/2022 20:31, Serguei Spitsyn wrote: > On Thu, 6 Oct 2022 07:16:44 GMT, Dmitry Samersoff wrote: > >>> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >>> >>> addressed review comments about is_JavaThread_current and @enablePreview tag >> >> src/hotspot/share/prims/jvmtiEnvBase.hpp line 180: >> >>> 178: JavaThread* current = JavaThread::current(); >>> 179: oop cur_obj = current->jvmti_vthread(); >>> 180: bool is_current = jt == current && (cur_obj == NULL || cur_obj == thr_obj); >> >> It might be better to restructure this "if" and check for jt==current before we ask for cur_obj, or at least add brackets. > > Thank you for the comment. > I've refactored it a little bit. Please, let me know if you agree with it. > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10586 -- Dmitry.Samersoff at bell-sw.com Technical Professional at BellSoft From ayang at openjdk.org Fri Oct 7 08:58:42 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 7 Oct 2022 08:58:42 GMT Subject: RFR: 8294907: Remove unused NativeLookup::dll_load In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 13:29:49 GMT, Albert Mingkun Yang wrote: > Trivial change of removing dead code. Thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/10595 From ayang at openjdk.org Fri Oct 7 08:58:43 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 7 Oct 2022 08:58:43 GMT Subject: Integrated: 8294907: Remove unused NativeLookup::dll_load In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 13:29:49 GMT, Albert Mingkun Yang wrote: > Trivial change of removing dead code. This pull request has now been integrated. Changeset: 118d93b3 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/118d93b3dc5bafc00dea03dba97446a04d919fd5 Stats: 19 lines in 2 files changed: 0 ins; 19 del; 0 mod 8294907: Remove unused NativeLookup::dll_load Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/10595 From sspitsyn at openjdk.org Fri Oct 7 09:04:31 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 7 Oct 2022 09:04:31 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 17:31:00 GMT, Serguei Spitsyn wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments about is_JavaThread_current and @enablePreview tag PING: Could someone review the CSR, please? ------------- PR: https://git.openjdk.org/jdk/pull/10586 From serguei.spitsyn at oracle.com Fri Oct 7 09:04:54 2022 From: serguei.spitsyn at oracle.com (Serguei Spitsyn) Date: Fri, 7 Oct 2022 09:04:54 +0000 Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: <7KcCEr8Byi3GKWeNrTJJedUNsjS_8e0V2-rymIVxVrM=.9025b65b-980c-44d3-8e4d-9e78c9d5d513@github.com> <0DUVv5cx4seGdyK6MVK2T4eoVAGoZXMWrgnYMOBlSfc=.8df46a96-8a30-4987-88c4-3bf5df03fd1d@github.com> Message-ID: Hi Dmitry, Do you have any plans to full review and approve this PR? Thanks, Serguei From: hotspot-dev on behalf of Dmitry Samersoff Date: Friday, October 7, 2022 at 1:34 AM To: Serguei Spitsyn , hotspot-dev at openjdk.org , serviceability-dev at openjdk.org Subject: Re: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] Hi Serguei, Looks good for me. Thank you! -Dmitry On 06/10/2022 20:31, Serguei Spitsyn wrote: > On Thu, 6 Oct 2022 07:16:44 GMT, Dmitry Samersoff wrote: > >>> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >>> >>> addressed review comments about is_JavaThread_current and @enablePreview tag >> >> src/hotspot/share/prims/jvmtiEnvBase.hpp line 180: >> >>> 178: JavaThread* current = JavaThread::current(); >>> 179: oop cur_obj = current->jvmti_vthread(); >>> 180: bool is_current = jt == current && (cur_obj == NULL || cur_obj == thr_obj); >> >> It might be better to restructure this "if" and check for jt==current before we ask for cur_obj, or at least add brackets. > > Thank you for the comment. > I've refactored it a little bit. Please, let me know if you agree with it. > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10586 -- Dmitry.Samersoff at bell-sw.com Technical Professional at BellSoft -------------- next part -------------- An HTML attachment was scrubbed... URL: From alanb at openjdk.org Fri Oct 7 09:15:31 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Oct 2022 09:15:31 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: <652elU9x7gEeP2r-fSMwWEReMBrRrHjGI3uv7Kt48ew=.37d46bd7-328d-4ea3-9bb2-f0d3499ddb20@github.com> On Fri, 7 Oct 2022 09:02:11 GMT, Serguei Spitsyn wrote: > PING: Could someone review the CSR, please? I reviewed it yesterday. ------------- PR: https://git.openjdk.org/jdk/pull/10586 From sspitsyn at openjdk.org Fri Oct 7 09:23:20 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 7 Oct 2022 09:23:20 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: <2nLj2H-Fp8RkVHpNhHtCbOrIXqTsU7faDfMXjBaU_zA=.39c2d81e-7228-49e2-95b0-58d5afbd4f3f@github.com> On Thu, 6 Oct 2022 17:31:00 GMT, Serguei Spitsyn wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments about is_JavaThread_current and @enablePreview tag Thank you, Alan! I did not refresh my browser, an so, did not see your review. ------------- PR: https://git.openjdk.org/jdk/pull/10586 From fyang at openjdk.org Fri Oct 7 09:55:22 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 7 Oct 2022 09:55:22 GMT Subject: RFR: 8294366: RISC-V: Partially mark out incompressible regions In-Reply-To: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> References: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> Message-ID: On Mon, 26 Sep 2022 12:19:15 GMT, Xiaolin Zheng wrote: > Shortly, the current RVC implementation in the RISC-V backend is a "whitelist mode", merely compressing instructions marked by "CompressibleRegion" that covers just part of C2 matching rules and stub code (only ~5% compression rate). Due to the originally existing large backend code base and to spread the coverage to nearly all instructions generated by the backend, we cannot modify them little by little, but should implement a "blacklist mode" (a compression rate to ~20% if complete), to exclude compressions from: > 1. relocations > 2. patchable instructions > 3. fixed length code slices whose code size is calculated > > Please check the discussions in the riscv-port mailing list[1] to go in for more details. > > This patch contains the first half of implementations which does not change the program behavior: it just introduces an "IncompressibleRegion" to indicate a piece of code "not compressible", marking out the patchable instructions and fixed-length code slices that are not able to compress by RVC. > > Besides, this patch also temporarily removes some automatic compression logic of branch instructions like "jal"s and "beq"s, for MachBranchNodes' fake labels could hamper the automatic compression as well and for making code clean. Please also check the discussions in the thread[1]. > > Please check the unsquashed commits to have a better review of the patch. > > > Tested a hotspot tier1 and tier2 with the option UseRVC turning on. > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > > Thanks, > Xiaolin I am fine to live with this change making explict those resions invalid for compressed instruction beforehand. But I have one comment. src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 327: > 325: // Make it a NOP. > 326: assert_alignment(pc()); > 327: nop(false); // 4 bytes Normally, the parameters for the assembler functions denotes the operands of the instrunction. It looks strange to me for the nop() assembler function to take one parameter. With that consideration, I think it might be better to make this explicit with a incompressible region mark. Something like: { IncompressibleRegion ir(this); nop(); } ------------- Changes requested by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10421 From jsjolen at openjdk.org Fri Oct 7 11:12:30 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Oct 2022 11:12:30 GMT Subject: Integrated: 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj In-Reply-To: References: Message-ID: <6pG0Bn_SP1OQzpMCPGvABQmzr4j2CQafsDshF1sQ6kI=.6a47e27b-a2da-4111-ae87-8200fbbe22db@github.com> On Fri, 23 Sep 2022 17:08:46 GMT, Johan Sj?len wrote: > Here's a suggested solution for the ticket mentioned and a use case for outputStream. I'm not attached to the name. > > This saves space for all allocated outputStreams, which is nice. It also makes the purpose of ResourceObj more clear ("please handle the life cycle for me"), reducing the need for it. > > Thank you for considering it. This pull request has now been integrated. Changeset: b38bed6d Author: Johan Sj?len Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/b38bed6d0ed6e1590a695a13a0d0c099e2bdd13a Stats: 102 lines in 14 files changed: 65 ins; 0 del; 37 mod 8294308: Allow dynamically choosing the MEMFLAGS of a type without ResourceObj Reviewed-by: coleenp, stefank, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/10412 From jsjolen at openjdk.org Fri Oct 7 11:35:09 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Oct 2022 11:35:09 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream Message-ID: Hi, I went through all of the places where LogStreams are created and removed the unnecessary ResourceMarks. I also added a ResourceMark in one place, where it was needed because of a call to `::name_and_sig_as_C_string` and moved one to the smallest scope where it is used. ------------- Commit messages: - Remove unnecessary ResourceMarks Changes: https://git.openjdk.org/jdk/pull/10602/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10602&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294954 Stats: 59 lines in 41 files changed: 2 ins; 57 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10602.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10602/head:pull/10602 PR: https://git.openjdk.org/jdk/pull/10602 From dsamersoff at openjdk.org Fri Oct 7 12:48:32 2022 From: dsamersoff at openjdk.org (Dmitry Samersoff) Date: Fri, 7 Oct 2022 12:48:32 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 17:31:00 GMT, Serguei Spitsyn wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments about is_JavaThread_current and @enablePreview tag Marked as reviewed by dsamersoff (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10586 From alanb at openjdk.org Fri Oct 7 12:53:30 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Oct 2022 12:53:30 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: References: Message-ID: <-0yo8KceENmJ48YPNoHCUkx_iEWpIE0mPJn_-BkjbWY=.76a8dcb8-f43a-4c8b-8912-43c7225c183d@github.com> On Mon, 26 Sep 2022 16:51:36 GMT, Michael Ernst wrote: >> 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni > > Michael Ernst has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Reinstate typos in Apache code that is copied into the JDK > - Merge ../jdk-openjdk into typos-typos > - Remove file that was removed upstream > - Fix inconsistency in capitalization > - Undo change in zlip > - Fix typos src/java.se/share/data/jdwp/jdwp.spec line 101: > 99: "platform thread " > 100: "in the target VM. This includes platform threads created with the Thread " > 101: "API and all native threads attached to the target VM with JNI code." The spec for the JDWP AllThreads command was significantly reworded in Java 19 so this is where this typo crept in. We have JDK-8294672 tracking it to fix for Java 20, maybe you should take it? ------------- PR: https://git.openjdk.org/jdk/pull/10029 From dholmes at openjdk.org Fri Oct 7 13:05:36 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Oct 2022 13:05:36 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding In-Reply-To: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: On Fri, 7 Oct 2022 07:29:16 GMT, Matthias Baesken wrote: > After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. Seems reasonable. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10600 From alanb at openjdk.org Fri Oct 7 13:19:47 2022 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 7 Oct 2022 13:19:47 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding In-Reply-To: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: <9wD8YwAnnj-KWVbiz81wfwgzhNi6UsUqA206T416Pgk=.88b21a0a-dc67-4384-aff7-754d6b978bd2@github.com> On Fri, 7 Oct 2022 07:29:16 GMT, Matthias Baesken wrote: > After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. src/java.base/windows/native/libnio/ch/wepoll.c line 865: > 863: #define inline __inline > 864: #endif > 865: This is 3rd party code so best to leave it out of this change. ------------- PR: https://git.openjdk.org/jdk/pull/10600 From dholmes at openjdk.org Fri Oct 7 13:21:21 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Oct 2022 13:21:21 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 11:19:55 GMT, Johan Sj?len wrote: > Hi, > > I went through all of the places where LogStreams are created and removed the unnecessary ResourceMarks. I also added a ResourceMark in one place, where it was needed because of a call to `::name_and_sig_as_C_string` and moved one to the smallest scope where it is used. How are you defining "unnecessary"? Are these unnecessary because there is zero resource allocation involved? Or "unnecessary" because a ResourceMark higher up the call stack covers it? ------------- PR: https://git.openjdk.org/jdk/pull/10602 From mbaesken at openjdk.org Fri Oct 7 13:29:48 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 7 Oct 2022 13:29:48 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding [v2] In-Reply-To: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: > After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: wepoll.c is 3rd party code, do not change it ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10600/files - new: https://git.openjdk.org/jdk/pull/10600/files/75d999ec..4dcdd911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10600&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10600&range=00-01 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10600.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10600/head:pull/10600 PR: https://git.openjdk.org/jdk/pull/10600 From mbaesken at openjdk.org Fri Oct 7 13:29:49 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 7 Oct 2022 13:29:49 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding [v2] In-Reply-To: <9wD8YwAnnj-KWVbiz81wfwgzhNi6UsUqA206T416Pgk=.88b21a0a-dc67-4384-aff7-754d6b978bd2@github.com> References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> <9wD8YwAnnj-KWVbiz81wfwgzhNi6UsUqA206T416Pgk=.88b21a0a-dc67-4384-aff7-754d6b978bd2@github.com> Message-ID: <2iqhUZ9LdbbteDx4y8RXCwxoeJPVB5aqiaaQAX2227M=.28830a9b-30b3-4e1d-866b-761440371e98@github.com> On Fri, 7 Oct 2022 13:16:27 GMT, Alan Bateman wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> wepoll.c is 3rd party code, do not change it > > src/java.base/windows/native/libnio/ch/wepoll.c line 865: > >> 863: #define inline __inline >> 864: #endif >> 865: > > This is 3rd party code so best to leave it out of this change. I agree, let's leave 3rd party code alone. I adjusted wepoll.c and kept the _MSC_VER check. ------------- PR: https://git.openjdk.org/jdk/pull/10600 From dholmes at openjdk.org Fri Oct 7 13:32:11 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Oct 2022 13:32:11 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 11:19:55 GMT, Johan Sj?len wrote: > Hi, > > I went through all of the places where LogStreams are created and removed the unnecessary ResourceMarks. I also added a ResourceMark in one place, where it was needed because of a call to `::name_and_sig_as_C_string` and moved one to the smallest scope where it is used. I see now the bug report suggests these RM were in place because the stream itself may have needed them but that this is no longer the case. So was that the only reason for all these RMs? ------------- PR: https://git.openjdk.org/jdk/pull/10602 From jwaters at openjdk.org Fri Oct 7 13:39:54 2022 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 7 Oct 2022 13:39:54 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v4] In-Reply-To: <0uuBnaZ4vxnwvwqPZzTniup_5ZxVSs_iq47NGmUzLFg=.09f9ee79-d50a-40ab-87aa-37a1a36e7699@github.com> References: <0uuBnaZ4vxnwvwqPZzTniup_5ZxVSs_iq47NGmUzLFg=.09f9ee79-d50a-40ab-87aa-37a1a36e7699@github.com> Message-ID: On Sun, 7 Aug 2022 07:57:49 GMT, Thomas Stuefe wrote: >> Nothing I could find in the tests that suggest they use this message as input, and none of them have failed with this patch, so I guess that's a good thing? :P >> >> I am slightly concerned with going the route of 2 different calls for WIN32 and the C runtime, since `JLI_ReportErrorMessageSys` is a shared function, only the implementations differ from platform to platform. I can't think of any solution off the top of my head to implement such a change without drastically changing either the Unix variant as well, or the declaration's signature in the shared header unfortunately. >> >> I was initially hesitant to change the formatting of spacing between the caller's message and system error, since the original intention for leaving it up to the caller may have been to allow for better flexibility. Also a concern was any behaviour differences that might result with the Unix variant, but it seems like the 2 format their messages entirely differently - While Windows appends the system error to the end of message without any formatting, Unix prints it on an entirely separate line above the message the caller passed it: >> >> JNIEXPORT void JNICALL >> JLI_ReportErrorMessageSys(const char* fmt, ...) { >> va_list vl; >> char *emsg; >> >> /* >> * TODO: its safer to use strerror_r but is not available on >> * Solaris 8. Until then.... >> */ >> emsg = strerror(errno); >> if (emsg != NULL) { >> fprintf(stderr, "%s\n", emsg); >> } >> >> va_start(vl, fmt); >> vfprintf(stderr, fmt, vl); >> fprintf(stderr, "\n"); >> va_end(vl); >> } >> >> >>> If you can make the fix for the CRT extra info small, I'd go for it. >> >> I don't quite get what you mean by that, should I revert the changes made to the freeit checks? > >> Nothing I could find in the tests that suggest they use this message as input, and none of them have failed with this patch, so I guess that's a good thing? :P > > Oh, that is fine :) Thanks for looking. I just mentioned it since you are new-ish. The tests that run as part of GHAs are only a tiny subset of all tests though, therefore there can be tests that fail and you would not notice. > > About the rest, starting to pull a thread somewhere and then noticing that the thread gets longer and longer is a normal thing. Then its up to you to decide whether fixing the isolated issue is worth it or whether more code needs to be reworked. > > Cheers, Thomas @tstuefe Do you think this should be ok now, since all reliance on [JDK-8292016](https://bugs.openjdk.org/browse/JDK-8292016) has been dropped? ------------- PR: https://git.openjdk.org/jdk/pull/9749 From jsjolen at openjdk.org Fri Oct 7 13:41:08 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Oct 2022 13:41:08 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 13:28:58 GMT, David Holmes wrote: >I see now the bug report suggests these RM were in place because the stream itself may have needed them but that this is no longer the case. So was that the only reason for all these RMs? There are RMs that I've looked at but left intact because they did have other reasons for being there (typically: string allocating functions). So yes, `LogStream` should be the only reason for all these RMs. ------------- PR: https://git.openjdk.org/jdk/pull/10602 From jsjolen at openjdk.org Fri Oct 7 13:51:15 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 7 Oct 2022 13:51:15 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: <3RXwTxz1C1mjzFvf-yKczgP4lCERhQQsJdCej7iXrFE=.38a314e4-70b5-4356-8360-1fbbbf68230b@github.com> On Fri, 7 Oct 2022 11:19:55 GMT, Johan Sj?len wrote: > Hi, > > I went through all of the places where LogStreams are created and removed the unnecessary ResourceMarks. I also added a ResourceMark in one place, where it was needed because of a call to `::name_and_sig_as_C_string` and moved one to the smallest scope where it is used. This PR does remove the RM in `VM_Operation::evaluate`, and I haven't checked all of the VM operations to see if anyone uses it. ------------- PR: https://git.openjdk.org/jdk/pull/10602 From mbaesken at openjdk.org Fri Oct 7 14:04:51 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 7 Oct 2022 14:04:51 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding [v2] In-Reply-To: References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: On Fri, 7 Oct 2022 13:29:48 GMT, Matthias Baesken wrote: >> After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > wepoll.c is 3rd party code, do not change it btw while looking at VS-related references, the build docu seems to be a little outdated too https://github.com/openjdk/jdk/blob/master/doc/building.md states "Windows XP is not a supported platform, but all newer Windows should be able to build the JDK." That's most likely not true any more , the minimum requirements for VS2017/VS2019 are https://learn.microsoft.com/en-us/visualstudio/releases/2019/system-requirements https://learn.microsoft.com/en-us/visualstudio/releases/2017/vs2017-system-requirements-vs so it is Windows7 SP1/Windows Server 2016 minimum . (releases older than that but newer than XP like Vista or Server 2008 are not supported any more) ------------- PR: https://git.openjdk.org/jdk/pull/10600 From mdoerr at openjdk.org Fri Oct 7 14:43:19 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 7 Oct 2022 14:43:19 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding [v2] In-Reply-To: References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: On Fri, 7 Oct 2022 13:29:48 GMT, Matthias Baesken wrote: >> After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > wepoll.c is 3rd party code, do not change it This version LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10600 From dcubed at openjdk.org Fri Oct 7 15:03:08 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 7 Oct 2022 15:03:08 GMT Subject: RFR: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem [v3] In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 20:10:45 GMT, Alex Menkov wrote: >> I like the fact that the fix is small and I really like the new test. I only >> have minor comments and a couple of questions. >> >> Please run these changes thru Tier[456] since that's where JVM/TI >> tests run in different configs w/ different options. > >> I like the fact that the fix is small and I really like the new test. I only have minor comments and a couple of questions. > > Thank you for the review > >> Please run these changes thru Tier[456] since that's where JVM/TI tests run in different configs w/ different options. > > In progress @alexmenkov - sorry for the delay in re-reviewing. I was out sick. ------------- PR: https://git.openjdk.org/jdk/pull/10032 From dcubed at openjdk.org Fri Oct 7 15:20:25 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 7 Oct 2022 15:20:25 GMT Subject: RFR: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem [v7] In-Reply-To: <5e7hJ80RXaHgAazmGH3VnvwYTtYNKLqym4oKz457uKY=.dcd02e54-4be7-4717-8c55-e30b38e6ede0@github.com> References: <5e7hJ80RXaHgAazmGH3VnvwYTtYNKLqym4oKz457uKY=.dcd02e54-4be7-4717-8c55-e30b38e6ede0@github.com> Message-ID: On Thu, 6 Oct 2022 17:44:22 GMT, Alex Menkov wrote: >> The problem is RedefineClasses does not update cached_class_bytes, so subsequent RetransformClasses gets obsolete class bytes (this are testcases 3-6 from the new test) >> >> cached_class_bytes are set when an agent instruments the class from ClassFileLoadHook. >> After successful RedefineClasses it should be reset. >> The fix updates ClassFileLoadHook caller to not use old cached_class_bytes with RedefineClasses (if some agent instruments the class, new cached_class_bytes are allocated for scratch_class) and updates cached_class_bytes after successful RedefineClasses or RetransformClasses. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Fixed compilation error on linux Thanks for making the minor changes. test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRetransform/libRedefineRetransform.cpp line 48: > 46: /* > 47: * Helper class for data exchange between RedefineClasses/RetransformClasses and > 48: * ClassFileLoadHook callback (saves class bytes passed to CFLH, typo: s/passed to/to be passed to/ test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRetransform/libRedefineRetransform.cpp line 49: > 47: * Helper class for data exchange between RedefineClasses/RetransformClasses and > 48: * ClassFileLoadHook callback (saves class bytes passed to CFLH, > 49: * allows to set new class bytes to return from CFLH). typo: s/allows to set new/allows setting new/ ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/10032 From dcubed at openjdk.org Fri Oct 7 15:20:28 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 7 Oct 2022 15:20:28 GMT Subject: RFR: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem [v3] In-Reply-To: <8quzXh-4dzhc3QOdxICE7br12AsKJbLMfSU_55X1mRQ=.fa398076-a388-4c70-906f-fdc1cfc301ba@github.com> References: <8quzXh-4dzhc3QOdxICE7br12AsKJbLMfSU_55X1mRQ=.fa398076-a388-4c70-906f-fdc1cfc301ba@github.com> Message-ID: On Mon, 12 Sep 2022 20:23:09 GMT, Alex Menkov wrote: >> test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRetransform/RedefineRetransform.java line 240: >> >>> 238: case 5: >>> 239: test("Redefine-Retransform-Redefine-Retransform with CFLH", () -> { >>> 240: redefine(1, 5); // CFLH sets cached class bytes to ver 1 >> >> I'm having trouble understanding why the CFLH version is '5' here. >> Update: I _think_ this is just to have the CFLH return a different version >> of the class bytes before the RedefineClasses() call does its work. I >> don't understand why you want to do this... > > Test cases 1-4 are from the bug description. > I added test cases 5 & 6 to verify additional code paths - they are the same as 3 & 4, but in RedefineClasses we provide new class bytes in CFLH. > I.e. in cases 3 and 4 after RedefineClasses classes have no cached bytes and class bytes are reconstituted in the subsequent Retransform; > In case 5 and 6 cache_bytes buffer is created during RedefineClasses, RetransformClasses use existing cache. Thanks for the explanation. >> test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRetransform/libRedefineRetransform.cpp line 48: >> >>> 46: } >>> 47: >>> 48: class ClassFileLoadHookHelper { >> >> A short comment describing the purpose of the `ClassFileLoadHookHelper` would >> be helpful to folks that only have a high level understanding of RedefineClasses() >> and RetransformClasses(). >> >> You did a very good job encapsulating support for a complicated sets of APIs >> into this helper. > > Added short description Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10032 From sspitsyn at openjdk.org Fri Oct 7 17:22:12 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 7 Oct 2022 17:22:12 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 17:31:00 GMT, Serguei Spitsyn wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments about is_JavaThread_current and @enablePreview tag Leonid and Dmitry, thank you for review! ------------- PR: https://git.openjdk.org/jdk/pull/10586 From kbarrett at openjdk.org Fri Oct 7 17:53:20 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 7 Oct 2022 17:53:20 GMT Subject: RFR: JDK-8294901: remove pre-VS2017 checks in Windows related coding [v2] In-Reply-To: References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: On Fri, 7 Oct 2022 13:29:48 GMT, Matthias Baesken wrote: >> After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > wepoll.c is 3rd party code, do not change it Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/10600 From amenkov at openjdk.org Fri Oct 7 19:44:33 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 7 Oct 2022 19:44:33 GMT Subject: RFR: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem [v8] In-Reply-To: References: Message-ID: > The problem is RedefineClasses does not update cached_class_bytes, so subsequent RetransformClasses gets obsolete class bytes (this are testcases 3-6 from the new test) > > cached_class_bytes are set when an agent instruments the class from ClassFileLoadHook. > After successful RedefineClasses it should be reset. > The fix updates ClassFileLoadHook caller to not use old cached_class_bytes with RedefineClasses (if some agent instruments the class, new cached_class_bytes are allocated for scratch_class) and updates cached_class_bytes after successful RedefineClasses or RetransformClasses. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Updated comments per Dan's request ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10032/files - new: https://git.openjdk.org/jdk/pull/10032/files/36c4ba60..63b5589f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10032&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10032&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10032.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10032/head:pull/10032 PR: https://git.openjdk.org/jdk/pull/10032 From amenkov at openjdk.org Fri Oct 7 20:06:29 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 7 Oct 2022 20:06:29 GMT Subject: RFR: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem [v7] In-Reply-To: References: <5e7hJ80RXaHgAazmGH3VnvwYTtYNKLqym4oKz457uKY=.dcd02e54-4be7-4717-8c55-e30b38e6ede0@github.com> Message-ID: On Fri, 7 Oct 2022 15:15:25 GMT, Daniel D. Daugherty wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed compilation error on linux > > test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRetransform/libRedefineRetransform.cpp line 48: > >> 46: /* >> 47: * Helper class for data exchange between RedefineClasses/RetransformClasses and >> 48: * ClassFileLoadHook callback (saves class bytes passed to CFLH, > > typo: s/passed to/to be passed to/ Done > test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRetransform/libRedefineRetransform.cpp line 49: > >> 47: * Helper class for data exchange between RedefineClasses/RetransformClasses and >> 48: * ClassFileLoadHook callback (saves class bytes passed to CFLH, >> 49: * allows to set new class bytes to return from CFLH). > > typo: s/allows to set new/allows setting new/ Done ------------- PR: https://git.openjdk.org/jdk/pull/10032 From njian at openjdk.org Sat Oct 8 15:34:47 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Sat, 8 Oct 2022 15:34:47 GMT Subject: RFR: 8294261: AArch64: Use pReg instead of pRegGov when possible In-Reply-To: References: Message-ID: On Mon, 3 Oct 2022 14:29:56 GMT, Nick Gasson wrote: > Looks OK to me! Thanks for the review! @nick-arm ------------- PR: https://git.openjdk.org/jdk/pull/10461 From xgong at openjdk.org Sat Oct 8 15:36:32 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Sat, 8 Oct 2022 15:36:32 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong wrote: > "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! Ping again, could anyone please help to take a review at this PR? Thanks in advance! ------------- PR: https://git.openjdk.org/jdk/pull/10332 From amenkov at openjdk.org Sat Oct 8 15:57:54 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 8 Oct 2022 15:57:54 GMT Subject: Integrated: 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 21:16:22 GMT, Alex Menkov wrote: > The problem is RedefineClasses does not update cached_class_bytes, so subsequent RetransformClasses gets obsolete class bytes (this are testcases 3-6 from the new test) > > cached_class_bytes are set when an agent instruments the class from ClassFileLoadHook. > After successful RedefineClasses it should be reset. > The fix updates ClassFileLoadHook caller to not use old cached_class_bytes with RedefineClasses (if some agent instruments the class, new cached_class_bytes are allocated for scratch_class) and updates cached_class_bytes after successful RedefineClasses or RetransformClasses. This pull request has now been integrated. Changeset: 495c0435 Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/495c043533d68106e07721b2e971006e9eba97e3 Stats: 594 lines in 4 files changed: 577 ins; 9 del; 8 mod 7124710: interleaved RedefineClasses() and RetransformClasses() calls may have a problem Reviewed-by: sspitsyn, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/10032 From xlinzheng at openjdk.org Sat Oct 8 15:59:58 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Sat, 8 Oct 2022 15:59:58 GMT Subject: RFR: 8294366: RISC-V: Partially mark out incompressible regions [v2] In-Reply-To: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> References: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> Message-ID: > Shortly, the current RVC implementation in the RISC-V backend is a "whitelist mode", merely compressing instructions marked by "CompressibleRegion" that covers just part of C2 matching rules and stub code (only ~5% compression rate). Due to the originally existing large backend code base and to spread the coverage to nearly all instructions generated by the backend, we cannot modify them little by little, but should implement a "blacklist mode" (a compression rate to ~20% if complete), to exclude compressions from: > 1. relocations > 2. patchable instructions > 3. fixed length code slices whose code size is calculated > > Please check the discussions in the riscv-port mailing list[1] to go in for more details. > > This patch contains the first half of implementations which does not change the program behavior: it just introduces an "IncompressibleRegion" to indicate a piece of code "not compressible", marking out the patchable instructions and fixed-length code slices that are not able to compress by RVC. > > Besides, this patch also temporarily removes some automatic compression logic of branch instructions like "jal"s and "beq"s, for MachBranchNodes' fake labels could hamper the automatic compression as well and for making code clean. Please also check the discussions in the thread[1]. > > Please check the unsquashed commits to have a better review of the patch. > > > Tested a hotspot tier1 and tier2 with the option UseRVC turning on. > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > > Thanks, > Xiaolin Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Plural form - comment polishment - patchable nop's code style - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part - [5] RVC: IncompressibleRegions for patchable labels - [4] RVC: IncompressibleRegions for fixed length code slices - [3] RVC: IncompressibleRegions for patchable nops - [2] RVC: Disable auto transformations for branch instructions - [1] RVC: Add the IncompressibleRegion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10421/files - new: https://git.openjdk.org/jdk/pull/10421/files/692b8f86..fc3657c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10421&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10421&range=00-01 Stats: 25190 lines in 644 files changed: 15000 ins; 7871 del; 2319 mod Patch: https://git.openjdk.org/jdk/pull/10421.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10421/head:pull/10421 PR: https://git.openjdk.org/jdk/pull/10421 From yadongwang at openjdk.org Sat Oct 8 15:59:58 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Sat, 8 Oct 2022 15:59:58 GMT Subject: RFR: 8294366: RISC-V: Partially mark out incompressible regions [v2] In-Reply-To: References: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> Message-ID: On Sat, 8 Oct 2022 06:42:19 GMT, Xiaolin Zheng wrote: >> Shortly, the current RVC implementation in the RISC-V backend is a "whitelist mode", merely compressing instructions marked by "CompressibleRegion" that covers just part of C2 matching rules and stub code (only ~5% compression rate). Due to the originally existing large backend code base and to spread the coverage to nearly all instructions generated by the backend, we cannot modify them little by little, but should implement a "blacklist mode" (a compression rate to ~20% if complete), to exclude compressions from: >> 1. relocations >> 2. patchable instructions >> 3. fixed length code slices whose code size is calculated >> >> Please check the discussions in the riscv-port mailing list[1] to go in for more details. >> >> This patch contains the first half of implementations which does not change the program behavior: it just introduces an "IncompressibleRegion" to indicate a piece of code "not compressible", marking out the patchable instructions and fixed-length code slices that are not able to compress by RVC. >> >> Besides, this patch also temporarily removes some automatic compression logic of branch instructions like "jal"s and "beq"s, for MachBranchNodes' fake labels could hamper the automatic compression as well and for making code clean. Please also check the discussions in the thread[1]. >> >> Please check the unsquashed commits to have a better review of the patch. >> >> >> Tested a hotspot tier1 and tier2 with the option UseRVC turning on. >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Plural form > - comment polishment > - patchable nop's code style > - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part > - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part > - [5] RVC: IncompressibleRegions for patchable labels > - [4] RVC: IncompressibleRegions for fixed length code slices > - [3] RVC: IncompressibleRegions for patchable nops > - [2] RVC: Disable auto transformations for branch instructions > - [1] RVC: Add the IncompressibleRegion lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10421 From fyang at openjdk.org Sat Oct 8 15:59:59 2022 From: fyang at openjdk.org (Fei Yang) Date: Sat, 8 Oct 2022 15:59:59 GMT Subject: RFR: 8294366: RISC-V: Partially mark out incompressible regions [v2] In-Reply-To: References: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> Message-ID: On Sat, 8 Oct 2022 06:42:19 GMT, Xiaolin Zheng wrote: >> Shortly, the current RVC implementation in the RISC-V backend is a "whitelist mode", merely compressing instructions marked by "CompressibleRegion" that covers just part of C2 matching rules and stub code (only ~5% compression rate). Due to the originally existing large backend code base and to spread the coverage to nearly all instructions generated by the backend, we cannot modify them little by little, but should implement a "blacklist mode" (a compression rate to ~20% if complete), to exclude compressions from: >> 1. relocations >> 2. patchable instructions >> 3. fixed length code slices whose code size is calculated >> >> Please check the discussions in the riscv-port mailing list[1] to go in for more details. >> >> This patch contains the first half of implementations which does not change the program behavior: it just introduces an "IncompressibleRegion" to indicate a piece of code "not compressible", marking out the patchable instructions and fixed-length code slices that are not able to compress by RVC. >> >> Besides, this patch also temporarily removes some automatic compression logic of branch instructions like "jal"s and "beq"s, for MachBranchNodes' fake labels could hamper the automatic compression as well and for making code clean. Please also check the discussions in the thread[1]. >> >> Please check the unsquashed commits to have a better review of the patch. >> >> >> Tested a hotspot tier1 and tier2 with the option UseRVC turning on. >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Plural form > - comment polishment > - patchable nop's code style > - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part > - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part > - [5] RVC: IncompressibleRegions for patchable labels > - [4] RVC: IncompressibleRegions for fixed length code slices > - [3] RVC: IncompressibleRegions for patchable nops > - [2] RVC: Disable auto transformations for branch instructions > - [1] RVC: Add the IncompressibleRegion Updated change looks good. Thanks. src/hotspot/cpu/riscv/assembler_riscv.hpp line 2113: > 2111: // > 2112: // 4. Using -XX:PrintAssemblyOptions=no-aliases could distinguish RVC instructions from > 2113: // normal ones. One more suggestion for this code comment: // ======================================== // RISC-V Compressed Instructions Extension // ======================================== // Note: // 1. Assembler functions encoding 16-bit compressed instructions always begin with a 'c_' // prefix, such as 'c_add'. Correspondingly, assembler functions encoding normal 32-bit // instructions with begin with a '_' prefix, such as "_add". Most of time users have no // need to explicitly emit these compressed instructions. Instead, they still use unified // wrappers such as 'add' which do the compressing work through 'c_add' depending on the // the operands of the instruction and availability of the RVC hardware extension. // // 2. 'CompressibleRegion' and 'IncompressibleRegion' are introduced to mark assembler scopes // within which instructions are qualified or unqualified to be compressed into their 16-bit // versions. An example: // // CompressibleRegion cr(_masm); // __ add(...); // this instruction will be compressed into 'c.and' when possible // { // IncompressibleRegion ir(_masm); // __ add(...); // this instruction will not be compressed // { // CompressibleRegion cr(_masm); // __ add(...); // this instruction will be compressed into 'c.and' when possible // } // } // // 3. When printing JIT assembly code, using -XX:PrintAssemblyOptions=no-aliases could help // distinguish compressed 16-bit instructions from normal 32-bit ones. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10421 From xlinzheng at openjdk.org Sat Oct 8 16:00:00 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Sat, 8 Oct 2022 16:00:00 GMT Subject: RFR: 8294366: RISC-V: Partially mark out incompressible regions [v2] In-Reply-To: References: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> Message-ID: On Sat, 8 Oct 2022 04:57:05 GMT, Fei Yang wrote: >> Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Plural form >> - comment polishment >> - patchable nop's code style >> - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part >> - Merge remote-tracking branch 'github-openjdk/master' into riscv-rvc-checkin-first-half-part >> - [5] RVC: IncompressibleRegions for patchable labels >> - [4] RVC: IncompressibleRegions for fixed length code slices >> - [3] RVC: IncompressibleRegions for patchable nops >> - [2] RVC: Disable auto transformations for branch instructions >> - [1] RVC: Add the IncompressibleRegion > > Updated change looks good. Thanks. Thank @RealFYang and @yadongw for taking the time to review this! Tests on QEMU seem okay. An integration first so that it won't be blocked by myself. > src/hotspot/cpu/riscv/assembler_riscv.hpp line 2113: > >> 2111: // >> 2112: // 4. Using -XX:PrintAssemblyOptions=no-aliases could distinguish RVC instructions from >> 2113: // normal ones. > > One more suggestion for this code comment: > > // ======================================== > // RISC-V Compressed Instructions Extension > // ======================================== > // Note: > // 1. Assembler functions encoding 16-bit compressed instructions always begin with a 'c_' > // prefix, such as 'c_add'. Correspondingly, assembler functions encoding normal 32-bit > // instructions with begin with a '_' prefix, such as "_add". Most of time users have no > // need to explicitly emit these compressed instructions. Instead, they still use unified > // wrappers such as 'add' which do the compressing work through 'c_add' depending on the > // the operands of the instruction and availability of the RVC hardware extension. > // > // 2. 'CompressibleRegion' and 'IncompressibleRegion' are introduced to mark assembler scopes > // within which instructions are qualified or unqualified to be compressed into their 16-bit > // versions. An example: > // > // CompressibleRegion cr(_masm); > // __ add(...); // this instruction will be compressed into 'c.and' when possible > // { > // IncompressibleRegion ir(_masm); > // __ add(...); // this instruction will not be compressed > // { > // CompressibleRegion cr(_masm); > // __ add(...); // this instruction will be compressed into 'c.and' when possible > // } > // } > // > // 3. When printing JIT assembly code, using -XX:PrintAssemblyOptions=no-aliases could help > // distinguish compressed 16-bit instructions from normal 32-bit ones. Thanks for the polish and it looks more concise than before. The other one, the code style of nop is fixed as well. ------------- PR: https://git.openjdk.org/jdk/pull/10421 From xlinzheng at openjdk.org Sat Oct 8 16:00:03 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Sat, 8 Oct 2022 16:00:03 GMT Subject: Integrated: 8294366: RISC-V: Partially mark out incompressible regions In-Reply-To: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> References: <7aIMgQ7ci4WeF_NN0WAc5e5WwgUOYochPsDaW6a5P0Q=.d9d035be-958f-4c00-916f-1dc45fad6c0e@github.com> Message-ID: On Mon, 26 Sep 2022 12:19:15 GMT, Xiaolin Zheng wrote: > Shortly, the current RVC implementation in the RISC-V backend is a "whitelist mode", merely compressing instructions marked by "CompressibleRegion" that covers just part of C2 matching rules and stub code (only ~5% compression rate). Due to the originally existing large backend code base and to spread the coverage to nearly all instructions generated by the backend, we cannot modify them little by little, but should implement a "blacklist mode" (a compression rate to ~20% if complete), to exclude compressions from: > 1. relocations > 2. patchable instructions > 3. fixed length code slices whose code size is calculated > > Please check the discussions in the riscv-port mailing list[1] to go in for more details. > > This patch contains the first half of implementations which does not change the program behavior: it just introduces an "IncompressibleRegion" to indicate a piece of code "not compressible", marking out the patchable instructions and fixed-length code slices that are not able to compress by RVC. > > Besides, this patch also temporarily removes some automatic compression logic of branch instructions like "jal"s and "beq"s, for MachBranchNodes' fake labels could hamper the automatic compression as well and for making code clean. Please also check the discussions in the thread[1]. > > Please check the unsquashed commits to have a better review of the patch. > > > Tested a hotspot tier1 and tier2 with the option UseRVC turning on. > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > > Thanks, > Xiaolin This pull request has now been integrated. Changeset: 542cc602 Author: Xiaolin Zheng Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/542cc602a7f023d3351133a321c4fa57249b8765 Stats: 111 lines in 8 files changed: 50 ins; 36 del; 25 mod 8294366: RISC-V: Partially mark out incompressible regions Reviewed-by: fyang, yadongwang ------------- PR: https://git.openjdk.org/jdk/pull/10421 From kbarrett at openjdk.org Sat Oct 8 22:41:31 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 8 Oct 2022 22:41:31 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v4] In-Reply-To: References: Message-ID: > 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal > 8155996: Improve concurrent refinement green zone control > 8134303: Introduce -XX:-G1UseConcRefinement > > Please review this change to the control of concurrent refinement. > > This new controller takes a different approach to the problem, addressing a > number of issues. > > The old controller used a multiple of the target number of cards to determine > the range over which increasing numbers of refinement threads should be > activated, and finally activating mutator refinement. This has a variety of > problems. It doesn't account for the processing rate, the rate of new dirty > cards, or the time available to perform the processing. This often leads to > unnecessary spikes in the number of running refinement threads. It also tends > to drive the pending number to the target quickly and keep it there, removing > the benefit from having pending dirty cards filter out new cards for nearby > writes. It can't delay and leave excess cards in the queue because it could > be a long time before another buffer is enqueued. > > The old controller was triggered by mutator threads enqueing card buffers, > when the number of cards in the queue exceeded a threshold near the target. > This required a complex activation protocol between the mutators and the > refinement threads. > > With the new controller there is a primary refinement thread that periodically > estimates how many refinement threads need to be running to reach the target > in time for the next GC, along with whether to also activate mutator > refinement. If the primary thread stops running because it isn't currently > needed, it sleeps for a period and reevaluates on wakeup. This eliminates any > involvement in the activation of refinement threads by mutator threads. > > The estimate of how many refinement threads are needed uses a prediction of > time until the next GC, the number of buffered cards, the predicted rate of > new dirty cards, and the predicted refinement rate. The number of running > threads is adjusted based on these periodically performed estimates. > > This new approach allows more dirty cards to be left in the queue until late > in the mutator phase, typically reducing the rate of new dirty cards, which > reduces the amount of concurrent refinement work needed. > > It also smooths out the number of running refinement threads, eliminating the > unnecessarily large spikes that are common with the old method. One benefit > is that the number of refinement threads (lazily) allocated is often much > lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem > described in JDK-8153225.) > > This change also provides a new method for calculating for the number of dirty > cards that should be pending at the start of a GC. While this calculation is > conceptually distinct from the thread control, the two were significanly > intertwined in the old controller. Changing this calculation separately and > first would have changed the behavior of the old controller in ways that might > have introduced regressions. Changing it after the thread control was changed > would have made it more difficult to test and measure the thread control in a > desirable configuration. > > The old calculation had various problems that are described in JDK-8155996. > In particular, it can get more or less stuck at low values, and is slow to > respond to changes. > > The old controller provided a number of product options, none of which were > very useful for real applications, and none of which are very applicable to > the new controller. All of these are being obsoleted. > > -XX:-G1UseAdaptiveConcRefinement > -XX:G1ConcRefinementGreenZone= > -XX:G1ConcRefinementYellowZone= > -XX:G1ConcRefinementRedZone= > -XX:G1ConcRefinementThresholdStep= > > The new controller *could* use G1ConcRefinementGreenZone to provide a fixed > value for the target number of cards, though it is poorly named for that. > > A configuration that was useful for some kinds of debugging and testing was to > disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a > very large value, effectively disabling concurrent refinement. To support > this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic > option has been added (see JDK-8155996). > > The other options are meaningless for the new controller. > > Because of these option changes, a CSR and a release note need to accompany > this change. > > Testing: > mach5 tier1-6 > various performance tests. > local (linux-x64) tier1 with -XX:-G1UseConcRefinement > > Performance testing found no regressions, but also little or no improvement > with default options, which was expected. With default options most of our > performance tests do very little concurrent refinement. And even for those > that do, while the old controller had a number of problems, the impact of > those problems is small and hard to measure for most applications. > > When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare > better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with > MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options > held constant) showed a statistically significant improvement of about 4.5% > for critical-jOPS. Using the changed controller, the difference between this > configuration and the default is fairly small, while the baseline shows > significant degradation with the more restrictive options. > > For all tests and configurations the new controller often creates many fewer > refinement threads. Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: - tschatzl comments - changed threads wanted logging per kstefanj - s/max_cards/mutator_refinement_threshold/ ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10256/files - new: https://git.openjdk.org/jdk/pull/10256/files/9a735bc0..a4bcbafd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=02-03 Stats: 30 lines in 5 files changed: 2 ins; 2 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/10256.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10256/head:pull/10256 PR: https://git.openjdk.org/jdk/pull/10256 From kbarrett at openjdk.org Sat Oct 8 23:34:37 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 8 Oct 2022 23:34:37 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v5] In-Reply-To: References: Message-ID: > 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal > 8155996: Improve concurrent refinement green zone control > 8134303: Introduce -XX:-G1UseConcRefinement > > Please review this change to the control of concurrent refinement. > > This new controller takes a different approach to the problem, addressing a > number of issues. > > The old controller used a multiple of the target number of cards to determine > the range over which increasing numbers of refinement threads should be > activated, and finally activating mutator refinement. This has a variety of > problems. It doesn't account for the processing rate, the rate of new dirty > cards, or the time available to perform the processing. This often leads to > unnecessary spikes in the number of running refinement threads. It also tends > to drive the pending number to the target quickly and keep it there, removing > the benefit from having pending dirty cards filter out new cards for nearby > writes. It can't delay and leave excess cards in the queue because it could > be a long time before another buffer is enqueued. > > The old controller was triggered by mutator threads enqueing card buffers, > when the number of cards in the queue exceeded a threshold near the target. > This required a complex activation protocol between the mutators and the > refinement threads. > > With the new controller there is a primary refinement thread that periodically > estimates how many refinement threads need to be running to reach the target > in time for the next GC, along with whether to also activate mutator > refinement. If the primary thread stops running because it isn't currently > needed, it sleeps for a period and reevaluates on wakeup. This eliminates any > involvement in the activation of refinement threads by mutator threads. > > The estimate of how many refinement threads are needed uses a prediction of > time until the next GC, the number of buffered cards, the predicted rate of > new dirty cards, and the predicted refinement rate. The number of running > threads is adjusted based on these periodically performed estimates. > > This new approach allows more dirty cards to be left in the queue until late > in the mutator phase, typically reducing the rate of new dirty cards, which > reduces the amount of concurrent refinement work needed. > > It also smooths out the number of running refinement threads, eliminating the > unnecessarily large spikes that are common with the old method. One benefit > is that the number of refinement threads (lazily) allocated is often much > lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem > described in JDK-8153225.) > > This change also provides a new method for calculating for the number of dirty > cards that should be pending at the start of a GC. While this calculation is > conceptually distinct from the thread control, the two were significanly > intertwined in the old controller. Changing this calculation separately and > first would have changed the behavior of the old controller in ways that might > have introduced regressions. Changing it after the thread control was changed > would have made it more difficult to test and measure the thread control in a > desirable configuration. > > The old calculation had various problems that are described in JDK-8155996. > In particular, it can get more or less stuck at low values, and is slow to > respond to changes. > > The old controller provided a number of product options, none of which were > very useful for real applications, and none of which are very applicable to > the new controller. All of these are being obsoleted. > > -XX:-G1UseAdaptiveConcRefinement > -XX:G1ConcRefinementGreenZone= > -XX:G1ConcRefinementYellowZone= > -XX:G1ConcRefinementRedZone= > -XX:G1ConcRefinementThresholdStep= > > The new controller *could* use G1ConcRefinementGreenZone to provide a fixed > value for the target number of cards, though it is poorly named for that. > > A configuration that was useful for some kinds of debugging and testing was to > disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a > very large value, effectively disabling concurrent refinement. To support > this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic > option has been added (see JDK-8155996). > > The other options are meaningless for the new controller. > > Because of these option changes, a CSR and a release note need to accompany > this change. > > Testing: > mach5 tier1-6 > various performance tests. > local (linux-x64) tier1 with -XX:-G1UseConcRefinement > > Performance testing found no regressions, but also little or no improvement > with default options, which was expected. With default options most of our > performance tests do very little concurrent refinement. And even for those > that do, while the old controller had a number of problems, the impact of > those problems is small and hard to measure for most applications. > > When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare > better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with > MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options > held constant) showed a statistically significant improvement of about 4.5% > for critical-jOPS. Using the changed controller, the difference between this > configuration and the default is fairly small, while the baseline shows > significant degradation with the more restrictive options. > > For all tests and configurations the new controller often creates many fewer > refinement threads. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: comments around alloc_bytes_rate being zero ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10256/files - new: https://git.openjdk.org/jdk/pull/10256/files/a4bcbafd..16432b12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=03-04 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10256.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10256/head:pull/10256 PR: https://git.openjdk.org/jdk/pull/10256 From kbarrett at openjdk.org Sat Oct 8 23:34:38 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 8 Oct 2022 23:34:38 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3] In-Reply-To: References: Message-ID: On Tue, 27 Sep 2022 12:12:05 GMT, Thomas Schatzl wrote: >>> A zero value for the prediction indicates that we don't have a valid >> prediction >> >> Why not? It's still possible that the alloc-rate is zero after start-up; I mean alloc-rate is up to applications. >> >> On a related note, there's special treatment for too-close upcoming GC pause later on, `if (_predicted_time_until_next_gc_ms > _update_period_ms) {`. Shouldn't there be sth similar for too-far upcoming GC pause? IOW, `incoming_rate * _predicted_time_until_next_gc_ms;` would be unreliable for farther prediction, right? > >> Why not? It's still possible that the alloc-rate is zero after start-up; I mean alloc-rate is up to applications. > > Allocation rate is the rate of allocation between GCs, to have a GC, you (almost) need non-zero allocation rate (not with periodic gcs). I've added some comments regarding the alloc_bytes_rate == 0 case. I don't think we need to do anything special for predicted GC times being far away, because we'll be periodically re-evaluating. ------------- PR: https://git.openjdk.org/jdk/pull/10256 From xgong at openjdk.org Sun Oct 9 06:04:05 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Sun, 9 Oct 2022 06:04:05 GMT Subject: RFR: 8294261: AArch64: Use pReg instead of pRegGov when possible In-Reply-To: References: Message-ID: <9Ov_yqj4jJ6VH7DVku23aOX2jyCdBFkscnq4tSIZCHE=.657dda61-0875-4a41-b1e5-2963af0b266b@github.com> On Wed, 28 Sep 2022 05:52:40 GMT, Ningsheng Jian wrote: > Currently we allocate SVE predicate register p0-p6 for pRegGov operand, which are used as governing predicates for load/store and arithmetic, and also define pReg operand for all allocatable predicate registers. Since some SVE instructions are fine to use/define p8-p15, e.g. predicate operations, this patch makes the matcher work for mixed use of pRegGov and pReg, and tries to match pReg when possible. If a predicate reg is defined as pReg but used as pRegGov, register allocator will handle that properly. > > With p8-p15 being used as non-temp register, we need to save them as well when saving all registers. The code of setting predicate reg slot in OopMap in RegisterSaver::save_live_registers() is also removed, because on safepoint, vector masks have been transformed to vector [1]. > > Tested on different SVE systems. Also tested with making RA to allocate p8-p15 first for vReg operand, so that a p8-p15 reg has more chance to be allocated, and if an SVE instruction, emitted by ad rule, does not accept p8-p5, assembler will crash. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L265 LGTM! ------------- Marked as reviewed by xgong (Committer). PR: https://git.openjdk.org/jdk/pull/10461 From jwaters at openjdk.org Sun Oct 9 08:14:46 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 9 Oct 2022 08:14:46 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf Message-ID: The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. ------------- Commit messages: - Remove Windows specific JLI_Snprintf implementation - Remove Windows JLI_Snprintf definition Changes: https://git.openjdk.org/jdk/pull/10625/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295017 Stats: 43 lines in 2 files changed: 9 ins; 34 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10625.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10625/head:pull/10625 PR: https://git.openjdk.org/jdk/pull/10625 From jwaters at openjdk.org Sun Oct 9 08:17:15 2022 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 9 Oct 2022 08:17:15 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v2] In-Reply-To: References: Message-ID: > The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Comment formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10625/files - new: https://git.openjdk.org/jdk/pull/10625/files/d021c774..35de1467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10625.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10625/head:pull/10625 PR: https://git.openjdk.org/jdk/pull/10625 From kbarrett at openjdk.org Sun Oct 9 18:00:57 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 9 Oct 2022 18:00:57 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v2] In-Reply-To: References: Message-ID: On Sun, 9 Oct 2022 08:17:15 GMT, Julian Waters wrote: >> The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Comment formatting src/java.base/share/native/libjli/jli_util.h line 91: > 89: * https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference > 90: * /snprintf-snprintf-snprintf-l-snwprintf-snwprintf-l?view=msvc-170 > 91: */ I don't think the comment about the *lack* of a workaround is needed, just adding clutter. But this isn't code I have much involvement with. Other than that, the change looks fine. ------------- PR: https://git.openjdk.org/jdk/pull/10625 From dholmes at openjdk.org Mon Oct 10 04:39:01 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 10 Oct 2022 04:39:01 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v14] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 12:45:08 GMT, Julian Waters wrote: >> Please review a small patch for dumping the failure reason when the MSVCRT libraries or the Java Virtual Machine fails to load on Windows, which can provide invaluable insight when debugging related launcher issues. >> >> See https://bugs.openjdk.org/browse/JDK-8292016 and the related Pull Request for the reason that the existing JLI error reporting utility was not used in this enhancement > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-4 > - Merge branch 'openjdk:master' into patch-4 > - Use - instead of : as a separator > - Merge branch 'openjdk:master' into patch-4 > - Make DLL_ERROR4 look a little better without changing what it means > - Revert changes to JLI_ReportErrorMessageSys > - Update java_md.c > - Update java_md.h > - Merge branch 'openjdk:master' into patch-4 > - Merge branch 'openjdk:master' into patch-4 > - ... and 6 more: https://git.openjdk.org/jdk/compare/fe291396...c3113cac src/java.base/windows/native/libjli/java_md.h line 48: > 46: */ > 47: > 48: void reportWithLastWindowsError(const char* message, ...); Why does this need to be exported in the header file? Are you expecting other code to call this? ------------- PR: https://git.openjdk.org/jdk/pull/9749 From dholmes at openjdk.org Mon Oct 10 05:17:41 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 10 Oct 2022 05:17:41 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v2] In-Reply-To: References: Message-ID: On Sun, 9 Oct 2022 08:17:15 GMT, Julian Waters wrote: >> The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Comment formatting Looks good modulo the comment block. ------------- PR: https://git.openjdk.org/jdk/pull/10625 From dholmes at openjdk.org Mon Oct 10 05:17:43 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 10 Oct 2022 05:17:43 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v2] In-Reply-To: References: Message-ID: On Sun, 9 Oct 2022 17:58:37 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment formatting > > src/java.base/share/native/libjli/jli_util.h line 91: > >> 89: * https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference >> 90: * /snprintf-snprintf-snprintf-l-snwprintf-snwprintf-l?view=msvc-170 >> 91: */ > > I don't think the comment about the *lack* of a workaround is needed, just adding clutter. But this isn't code I have much involvement with. Other than that, the change looks fine. I agree, we don't document the absence of a workaround. ------------- PR: https://git.openjdk.org/jdk/pull/10625 From aturbanov at openjdk.org Mon Oct 10 10:11:16 2022 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 10 Oct 2022 10:11:16 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 17:31:00 GMT, Serguei Spitsyn wrote: >> The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 >> >> A few tests are impacted by this fix: >> >> test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest >> test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest >> test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 >> >> >> The following test has been removed as non-relevant any more: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` >> >> New negative test has been added instead: >> ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` >> >> All JVM TI and JPDA tests were used locally for verification. >> They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. >> >> Mach5 test runs on all platforms are TBD. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comments about is_JavaThread_current and @enablePreview tag test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java line 39: > 37: static native void testUnsuspendedThread(Thread thread); > 38: > 39: static private volatile boolean doStop; Let's use default modifiers order Suggestion: private static volatile boolean doStop; test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java line 41: > 39: static private volatile boolean doStop; > 40: > 41: static private void sleep(long millis) { Let's use default modifiers order Suggestion: private static void sleep(long millis) { ------------- PR: https://git.openjdk.org/jdk/pull/10586 From jwaters at openjdk.org Mon Oct 10 11:59:56 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 10 Oct 2022 11:59:56 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v2] In-Reply-To: References: Message-ID: <_02cjOK0AMcoTHXGR7nW9TTqm-2ECZCXj6iHgWHZt9k=.c3277263-7aeb-4d3d-a0ac-5afb0f749047@github.com> On Mon, 10 Oct 2022 05:06:00 GMT, David Holmes wrote: >> src/java.base/share/native/libjli/jli_util.h line 91: >> >>> 89: * https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference >>> 90: * /snprintf-snprintf-snprintf-l-snwprintf-snwprintf-l?view=msvc-170 >>> 91: */ >> >> I don't think the comment about the *lack* of a workaround is needed, just adding clutter. But this isn't code I have much involvement with. Other than that, the change looks fine. > > I agree, we don't document the absence of a workaround. Removed the comment as requested ------------- PR: https://git.openjdk.org/jdk/pull/10625 From jwaters at openjdk.org Mon Oct 10 12:00:53 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 10 Oct 2022 12:00:53 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v14] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 04:35:12 GMT, David Holmes wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into patch-4 >> - Merge branch 'openjdk:master' into patch-4 >> - Use - instead of : as a separator >> - Merge branch 'openjdk:master' into patch-4 >> - Make DLL_ERROR4 look a little better without changing what it means >> - Revert changes to JLI_ReportErrorMessageSys >> - Update java_md.c >> - Update java_md.h >> - Merge branch 'openjdk:master' into patch-4 >> - Merge branch 'openjdk:master' into patch-4 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/ad28ef9e...c3113cac > > src/java.base/windows/native/libjli/java_md.h line 48: > >> 46: */ >> 47: >> 48: void reportWithLastWindowsError(const char* message, ...); > > Why does this need to be exported in the header file? Are you expecting other code to call this? I left that in there since it would possibly be a useful utility to have for other JLI code that might need to work with Windows errors in the future ------------- PR: https://git.openjdk.org/jdk/pull/9749 From jwaters at openjdk.org Mon Oct 10 11:59:55 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 10 Oct 2022 11:59:55 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v3] In-Reply-To: References: Message-ID: > The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Comment documenting change isn't required - Merge branch 'openjdk:master' into patch-1 - Comment formatting - Remove Windows specific JLI_Snprintf implementation - Remove Windows JLI_Snprintf definition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10625/files - new: https://git.openjdk.org/jdk/pull/10625/files/35de1467..9149aae1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=01-02 Stats: 104 lines in 6 files changed: 2 ins; 97 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10625.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10625/head:pull/10625 PR: https://git.openjdk.org/jdk/pull/10625 From jwaters at openjdk.org Mon Oct 10 12:05:59 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 10 Oct 2022 12:05:59 GMT Subject: RFR: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot [v35] In-Reply-To: References: Message-ID: <6WwMRowvN-kQ7FarkLaqvoTzuri379ZdSHuTiwlXbXo=.32112bd1-0aa0-46a6-8b35-5209cc969d3d@github.com> On Mon, 3 Oct 2022 13:37:38 GMT, Julian Waters wrote: >> A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Naming Closing - Will narrow the fix to JLI for now in another changeset ------------- PR: https://git.openjdk.org/jdk/pull/9870 From jwaters at openjdk.org Mon Oct 10 12:05:59 2022 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 10 Oct 2022 12:05:59 GMT Subject: Withdrawn: 8292016: Cleanup legacy error reporting in the JDK outside of HotSpot In-Reply-To: References: Message-ID: On Sun, 14 Aug 2022 16:21:31 GMT, Julian Waters wrote: > A large section of error reporting code in the JDK does not properly handle WIN32 API errors and instead mixes them with errors originating from C. Since they can be rather easily replaced and coming up with an elegant solution proved to be too much of a hassle to be worth it, and some of the concerns they address no longer are an issue with current versions of the platforms supported by the JDK, they can be easily removed without much effect. The remaining utilities that are still needed now instead report directly from strerror, with a new subsystem for WIN32 errors put in place wherever required, to minimize confusion when they are used, which was a problem with earlier solutions to this issue. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9870 From mdoerr at openjdk.org Mon Oct 10 14:15:55 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 14:15:55 GMT Subject: RFR: 8295069: [PPC64, possibly aarch64] Performance regression after JDK-8290025 Message-ID: I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. May also be relevant for aarch64 which I haven't checked. I'm checking explicitly for GCs which are known to be safe. Maybe there is a better way to check this. ------------- Commit messages: - 8295069: [PPC64, possibly aarch64] Performance regression after JDK-8290025 Changes: https://git.openjdk.org/jdk/pull/10632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295069 Stats: 10 lines in 2 files changed: 6 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10632.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10632/head:pull/10632 PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Mon Oct 10 14:37:15 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 14:37:15 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 14:02:09 GMT, Martin Doerr wrote: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. May also be relevant for aarch64 which I haven't checked. > I'm checking explicitly for GCs which are known to be safe. Maybe there is a better way to check this. Changed to draft. Aarch64 seems to have a better solution. I should probably implement it the same way. ------------- PR: https://git.openjdk.org/jdk/pull/10632 From ngasson at openjdk.org Mon Oct 10 14:41:33 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 10 Oct 2022 14:41:33 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 14:02:09 GMT, Martin Doerr wrote: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. May also be relevant for aarch64 which I haven't checked. > I'm checking explicitly for GCs which are known to be safe. Maybe there is a better way to check this. This was fixed for AArch64 in JDK-8290700 - maybe it's worth adopting the `NMethodPatchingType` enum used there? ------------- PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Mon Oct 10 15:07:43 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 15:07:43 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 14:02:09 GMT, Martin Doerr wrote: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. May also be relevant for aarch64 which I haven't checked. > I'm checking explicitly for GCs which are known to be safe. Maybe there is a better way to check this. Thanks for the hint! We had already noticed and I was working on it in the meantime. ------------- PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Mon Oct 10 15:12:51 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 15:12:51 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v2] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 15:07:43 GMT, Martin Doerr wrote: >> I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Implement like on aarch64. Note: `conc_instruction_and_data_patch` is for generational ZGC which is not yet in jdk master. PPC64 implementation will still work if we forget to update from `conc_data_patch`. ------------- PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Mon Oct 10 15:07:43 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 15:07:43 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v2] In-Reply-To: References: Message-ID: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. May also be relevant for aarch64 which I haven't checked. > I'm checking explicitly for GCs which are known to be safe. Maybe there is a better way to check this. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Implement like on aarch64. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10632/files - new: https://git.openjdk.org/jdk/pull/10632/files/7c19cb17..99d133ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=00-01 Stats: 17 lines in 5 files changed: 13 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10632.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10632/head:pull/10632 PR: https://git.openjdk.org/jdk/pull/10632 From aph at openjdk.org Mon Oct 10 17:29:02 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 10 Oct 2022 17:29:02 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v2] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 15:07:43 GMT, Martin Doerr wrote: >> I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Implement like on aarch64. Might it make sense for PPC to do what AArch64 does? We don't need any memory fence instructions on the fast path. ------------- PR: https://git.openjdk.org/jdk/pull/10632 From kbarrett at openjdk.org Mon Oct 10 19:48:46 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 10 Oct 2022 19:48:46 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v3] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 11:59:55 GMT, Julian Waters wrote: >> The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Comment documenting change isn't required > - Merge branch 'openjdk:master' into patch-1 > - Comment formatting > - Remove Windows specific JLI_Snprintf implementation > - Remove Windows JLI_Snprintf definition Looks good. ------------- PR: https://git.openjdk.org/jdk/pull/10625 From svkamath at openjdk.org Mon Oct 10 20:15:54 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 10 Oct 2022 20:15:54 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8] In-Reply-To: References: <8LEJXqdKQPQe3lNuMSQql9YLgbcESJzfupkgORdvsFc=.807157d6-4506-4f04-ba20-a032d6ba973c@github.com> Message-ID: <8Hz7TtN3qVWn324XlTdBZCCdUbSQfBFwNudrf65mMIs=.9b78e1af-2a21-469e-961b-607e36884637@github.com> On Fri, 30 Sep 2022 17:24:31 GMT, Vladimir Kozlov wrote: >> @vnkozlov I spoke too soon. All the GHA tests passed in the dummy draft PR I created using Smita's patch: >> https://github.com/openjdk/jdk/pull/10500 >> Please take a look. No build failures reported and all tier1 tests passed. > >> @sviswa7 The failure is due to [JDK-8293618](https://bugs.openjdk.org/browse/JDK-8293618), @smita-kamath please merge with master. Thanks. > > Yes, I tested with latest JDK sources which includes JDK-8293618. @vnkozlov, I have implemented all of the reviewers comments. Could you kindly test this patch? Thanks a lot for your help. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From mdoerr at openjdk.org Mon Oct 10 21:00:49 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 21:00:49 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v2] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 17:26:49 GMT, Andrew Haley wrote: > Might it make sense for PPC to do what AArch64 does? We don't need any memory fence instructions on the fast path. We are actually doing what AArch64 does, now. AArch64 still uses `membar(__ LoadLoad)` for (non-generational) ZGC and ShenandoahGC, because they use `NMethodPatchingType::conc_data_patch`. Other GCs don't need any memory fence instructions. (See `BarrierSetAssembler::nmethod_entry_barrier`.) Exactly like on PPC64, now. Note that the patching epoch implementation is protected by `NMethodPatchingType::conc_instruction_and_data_patch` and is hence not used at all in jdk master. (Only in generational ZGC which is not yet integrated into jdk master.) It may make sense to implement the patching epoch code on PPC64 as well. We could evaluate it, but we should better do that with generational ZGC (like on AArch64). ------------- PR: https://git.openjdk.org/jdk/pull/10632 From kvn at openjdk.org Mon Oct 10 21:10:10 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 10 Oct 2022 21:10:10 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: References: Message-ID: <66be8SJdxPOqmqsQ1YIwS4zM4GwPerypGIf8IbfxhRs=.1d03c94a-f3e5-40ae-999e-bdd5f328170d@github.com> On Thu, 6 Oct 2022 06:28:04 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated instruct to use kmovw I started new testing. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From mdoerr at openjdk.org Mon Oct 10 22:14:12 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 22:14:12 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v3] In-Reply-To: References: Message-ID: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyrigth years. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10632/files - new: https://git.openjdk.org/jdk/pull/10632/files/99d133ba..d9ea8790 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=01-02 Stats: 8 lines in 4 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10632.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10632/head:pull/10632 PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Mon Oct 10 22:28:02 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 10 Oct 2022 22:28:02 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v4] In-Reply-To: References: Message-ID: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Use 64 bit load for _nmethod_disarm_value. (Makes a difference on Big Endian\!) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10632/files - new: https://git.openjdk.org/jdk/pull/10632/files/d9ea8790..08b86237 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10632&range=02-03 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10632.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10632/head:pull/10632 PR: https://git.openjdk.org/jdk/pull/10632 From sspitsyn at openjdk.org Mon Oct 10 23:10:05 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 10 Oct 2022 23:10:05 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v2] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 10:08:59 GMT, Andrey Turbanov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> addressed review comments about is_JavaThread_current and @enablePreview tag > > test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java line 41: > >> 39: static private volatile boolean doStop; >> 40: >> 41: static private void sleep(long millis) { > > Let's use default modifiers order > Suggestion: > > private static void sleep(long millis) { Okay, fixed. Thank you for the comment! ------------- PR: https://git.openjdk.org/jdk/pull/10586 From sspitsyn at openjdk.org Mon Oct 10 23:44:05 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 10 Oct 2022 23:44:05 GMT Subject: RFR: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread [v3] In-Reply-To: References: Message-ID: <9EtEbxBte92GHE4_tKJji9mFILJbJ3wA9W6F-7LkY_s=.8460585b-7c6e-4ef7-86ef-2aa6a442028c@github.com> > The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 > > A few tests are impacted by this fix: > > test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 > > > The following test has been removed as non-relevant any more: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` > > New negative test has been added instead: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` > > All JVM TI and JPDA tests were used locally for verification. > They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. > > Mach5 test runs on all platforms are TBD. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: make default order of method modifiers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10586/files - new: https://git.openjdk.org/jdk/pull/10586/files/5991659f..6d41ced5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10586&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10586&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10586.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10586/head:pull/10586 PR: https://git.openjdk.org/jdk/pull/10586 From njian at openjdk.org Tue Oct 11 01:08:02 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Tue, 11 Oct 2022 01:08:02 GMT Subject: RFR: 8294261: AArch64: Use pReg instead of pRegGov when possible In-Reply-To: <9Ov_yqj4jJ6VH7DVku23aOX2jyCdBFkscnq4tSIZCHE=.657dda61-0875-4a41-b1e5-2963af0b266b@github.com> References: <9Ov_yqj4jJ6VH7DVku23aOX2jyCdBFkscnq4tSIZCHE=.657dda61-0875-4a41-b1e5-2963af0b266b@github.com> Message-ID: On Sun, 9 Oct 2022 06:01:36 GMT, Xiaohong Gong wrote: > LGTM! Thanks for the review! @XiaohongGong ------------- PR: https://git.openjdk.org/jdk/pull/10461 From njian at openjdk.org Tue Oct 11 01:09:34 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Tue, 11 Oct 2022 01:09:34 GMT Subject: Integrated: 8294261: AArch64: Use pReg instead of pRegGov when possible In-Reply-To: References: Message-ID: On Wed, 28 Sep 2022 05:52:40 GMT, Ningsheng Jian wrote: > Currently we allocate SVE predicate register p0-p6 for pRegGov operand, which are used as governing predicates for load/store and arithmetic, and also define pReg operand for all allocatable predicate registers. Since some SVE instructions are fine to use/define p8-p15, e.g. predicate operations, this patch makes the matcher work for mixed use of pRegGov and pReg, and tries to match pReg when possible. If a predicate reg is defined as pReg but used as pRegGov, register allocator will handle that properly. > > With p8-p15 being used as non-temp register, we need to save them as well when saving all registers. The code of setting predicate reg slot in OopMap in RegisterSaver::save_live_registers() is also removed, because on safepoint, vector masks have been transformed to vector [1]. > > Tested on different SVE systems. Also tested with making RA to allocate p8-p15 first for vReg operand, so that a p8-p15 reg has more chance to be allocated, and if an SVE instruction, emitted by ad rule, does not accept p8-p5, assembler will crash. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L265 This pull request has now been integrated. Changeset: 4b17d28a Author: Ningsheng Jian URL: https://git.openjdk.org/jdk/commit/4b17d28a6d56726d49090bfd05d945e8f688fe53 Stats: 103 lines in 6 files changed: 6 ins; 17 del; 80 mod 8294261: AArch64: Use pReg instead of pRegGov when possible Reviewed-by: ngasson, xgong ------------- PR: https://git.openjdk.org/jdk/pull/10461 From haosun at openjdk.org Tue Oct 11 01:35:19 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 11 Oct 2022 01:35:19 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options Message-ID: In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. $ java -XX:+PrintBytecodeHistogram --version | head -20 openjdk 20-internal 2023-03-21 OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) Histogram of 5004099 executed bytecodes: absolute relative code name ---------------------------------------------------------------------- 319124 6.38% dc fast_aload_0 313397 6.26% e0 fast_iload 251436 5.02% b6 invokevirtual 227428 4.54% 19 aload 166054 3.32% a7 goto 159167 3.18% 2b aload_1 151803 3.03% de fast_aaccess_0 136787 2.73% 1b iload_1 124037 2.48% 36 istore 118791 2.37% 84 iinc 118121 2.36% 1c iload_2 110484 2.21% a2 if_icmpge $ java -XX:+PrintBytecodePairHistogram --version | head -20 openjdk 20-internal 2023-03-21 OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) Histogram of 4804441 executed bytecode pairs: absolute relative codes 1st bytecode 2nd bytecode ---------------------------------------------------------------------- 77602 1.615% 84 a7 iinc goto 49749 1.035% 36 e0 istore fast_iload 48931 1.018% e0 10 fast_iload bipush 46294 0.964% e0 b6 fast_iload invokevirtual 42661 0.888% a7 e0 goto fast_iload 42243 0.879% 3a 19 astore aload 40138 0.835% 19 b9 aload invokeinterface 36617 0.762% dc 2b fast_aload_0 aload_1 35745 0.744% b7 dc invokespecial fast_aload_0 35384 0.736% 19 b6 aload invokevirtual 35035 0.729% b6 de invokevirtual fast_aaccess_0 34667 0.722% dc b6 fast_aload_0 invokevirtual In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. ------------- Commit messages: - 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options Changes: https://git.openjdk.org/jdk/pull/10642/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295023 Stats: 40 lines in 2 files changed: 28 ins; 7 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10642.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10642/head:pull/10642 PR: https://git.openjdk.org/jdk/pull/10642 From dholmes at openjdk.org Tue Oct 11 01:42:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Oct 2022 01:42:24 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v14] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 11:58:33 GMT, Julian Waters wrote: >> src/java.base/windows/native/libjli/java_md.h line 48: >> >>> 46: */ >>> 47: >>> 48: void reportWithLastWindowsError(const char* message, ...); >> >> Why does this need to be exported in the header file? Are you expecting other code to call this? > > I left that in there since it would possibly be a useful utility to have for other JLI code that might need to work with Windows errors in the future In that case shouldn't the `JLI_xxx` naming scheme be retained? ------------- PR: https://git.openjdk.org/jdk/pull/9749 From dholmes at openjdk.org Tue Oct 11 02:03:25 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Oct 2022 02:03:25 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v3] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 11:59:55 GMT, Julian Waters wrote: >> The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Comment documenting change isn't required > - Merge branch 'openjdk:master' into patch-1 > - Comment formatting > - Remove Windows specific JLI_Snprintf implementation > - Remove Windows JLI_Snprintf definition Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10625 From kvn at openjdk.org Tue Oct 11 02:09:31 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Oct 2022 02:09:31 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 06:28:04 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Updated instruct to use kmovw Latest version v12 passed my tier1-4 testing. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9781 From xlinzheng at openjdk.org Tue Oct 11 06:48:04 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 11 Oct 2022 06:48:04 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible Message-ID: This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. Chaining PR #10421. 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] 2. Performance: conservatively no regressions observed. [3] Having tested several times hotspot tier1~tier4; Testing another turn on board. [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html ------------- Commit messages: - [7] Blacklist mode - [6] RVC: IncompressibleRegions for relocations Changes: https://git.openjdk.org/jdk/pull/10643/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295110 Stats: 207 lines in 13 files changed: 50 ins; 25 del; 132 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From mbaesken at openjdk.org Tue Oct 11 07:14:34 2022 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 11 Oct 2022 07:14:34 GMT Subject: Integrated: JDK-8294901: remove pre-VS2017 checks in Windows related coding In-Reply-To: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> References: <0kJCOAI9EAmXB8M7Lqe14-a5sU5V65_KmMcH5NX0eYw=.76a92302-e8c8-44e3-b76a-2ca543144ddf@github.com> Message-ID: <7Xozge6RtCgeOEOiyaGA2Xh_IKKucAAjfTUrhTzi0RI=.7e91960e-7412-4feb-b718-9230d2123fcf@github.com> On Fri, 7 Oct 2022 07:29:16 GMT, Matthias Baesken wrote: > After "8293162: Drop support for VS2017" limited current support to VS2019 and VS2022 it is most likely safe to remove various checks/workarounds related to older VS compilers like VS2015 or VS2013. This pull request has now been integrated. Changeset: 5e05e421 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/5e05e421ed49158185167c010bd1e4f690eab610 Stats: 33 lines in 3 files changed: 0 ins; 32 del; 1 mod 8294901: remove pre-VS2017 checks in Windows related coding Reviewed-by: dholmes, mdoerr, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/10600 From tschatzl at openjdk.org Tue Oct 11 07:39:25 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Oct 2022 07:39:25 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 12:28:35 GMT, Stefan Karlsson wrote: >> When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. >> >> ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. >> >> In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. >> >> I'd like to change the single-generation ZGC to do the same. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Guard verify_not_claimed with ifdef ASSERT Marked as reviewed by tschatzl (Reviewer). src/hotspot/share/classfile/classLoaderDataGraph.hpp line 69: > 67: static void clear_claimed_marks(); > 68: static void clear_claimed_marks(int claim); > 69: static void verify_claimed_marks_cleared(int claim); Is there a reason to not use `NOT_DEBUG_RETURN` for this method like for `ClassLoaderData::verify_not_claimed()`? I would prefer this to be consistent. ------------- PR: https://git.openjdk.org/jdk/pull/10591 From haosun at openjdk.org Tue Oct 11 07:50:28 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 11 Oct 2022 07:50:28 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension In-Reply-To: References: Message-ID: On Fri, 23 Sep 2022 11:13:40 GMT, Bhavana Kilambi wrote: > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. test/hotspot/jtreg/compiler/vectorization/TestEor3AArch64.java line 87: > 85: @Test > 86: @IR(counts = {"veor3_neon", "> 0"}, applyIf = {"MaxVectorSize", "16"}) > 87: @IR(counts = {"veor3_sve", "> 0"}, applyIfAnd = {"UseSVE", "2", "MaxVectorSize", "> 16"}) Suggestion: @IR(counts = {"veor3_sve", "> 0"}, applyIfAnd = {"UseSVE", "2", "MaxVectorSize", "> 16"}, applyIfCPUFeature = {"svesha3", "true"}) After this PR(https://github.com/openjdk/jdk/pull/10402), `applyIf` and `applyIfCPUFeature` are evaluated as a logical conjunction. We can check CPU features and VM options at the same time now. Of course, the comment at line 79 should be removed. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From chagedorn at openjdk.org Tue Oct 11 08:18:08 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 11 Oct 2022 08:18:08 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: > The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: > > The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. > > Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. > > I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Always read full filename and strip prefix path and only then cut filename to fit output buffer - Merge branch 'master' into JDK-8293422 - Merge branch 'master' into JDK-8293422 - Review comments from Thomas - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ - 8293422: DWARF emitted by Clang cannot be parsed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10287/files - new: https://git.openjdk.org/jdk/pull/10287/files/6fbeee23..24f624f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10287&range=02-03 Stats: 45094 lines in 1461 files changed: 23780 ins; 13852 del; 7462 mod Patch: https://git.openjdk.org/jdk/pull/10287.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10287/head:pull/10287 PR: https://git.openjdk.org/jdk/pull/10287 From chagedorn at openjdk.org Tue Oct 11 08:18:11 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 11 Oct 2022 08:18:11 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v3] In-Reply-To: References: Message-ID: On Thu, 22 Sep 2022 11:55:56 GMT, Thomas Schatzl wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments from Thomas > > src/hotspot/share/utilities/elfFile.cpp line 1613: > >> 1611: if (current_index == file_index) { >> 1612: // Found correct file. >> 1613: strip_path_prefix(filename, filename_len); > > After some digging I believe this is the wrong place to strip the path prefix, and causes the strange workaround in the `decoder_get_source_info_valid_truncated` gtest. > > The call to `_reader.read_string()` above should do the stripping as it is read; if like in the gtest, the given `filename_len` is too small to contain the original string, the `strip_path_prefix` tries to strip the too small buffer, but what has actually been intended was probably stripping the entire path and then limiting the return value. > > I.e. a more useful implementation of this would be reading the string into a temporary buffer, stripping the path prefix and then copying the result to the output buffer. > > I can see the reasoning for why the current implementation is as is, it is nowhere specified what the contents of the filename string in `debug_aranges` should be, and what should be actually printed. > > Looking at callers of this method, it might actually a problem when using clang: `VMError::print_native_stack` uses a 128 byte buffer, `NativeCallStack::print_on` uses a 1024 byte buffer. > > I can see that at least 128 bytes would be just a bit small, but then we (Oracle) do not use clang. It's up to you to fix this imo. Sorry for the delay to come back to this. I think you are right. We should always first strip the prefix path and only then cut the filename to fit it into the `filename` output buffer. I've pushed an update that with a temporary buffer of size 1024 and reverted the gtest changes. Now it works with Clang and GCC without modifying the test. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From stefank at openjdk.org Tue Oct 11 09:05:21 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Oct 2022 09:05:21 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v2] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 07:31:04 GMT, Thomas Schatzl wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Guard verify_not_claimed with ifdef ASSERT > > src/hotspot/share/classfile/classLoaderDataGraph.hpp line 69: > >> 67: static void clear_claimed_marks(); >> 68: static void clear_claimed_marks(int claim); >> 69: static void verify_claimed_marks_cleared(int claim); > > Is there a reason to not use `NOT_DEBUG_RETURN` for this method like for `ClassLoaderData::verify_not_claimed()`? I would prefer this to be consistent. I just think the NOT_DEBUG_RETURN / ASSERT macros pollute the code and makes it ugly to read, so I opted to not do it. This function is absolutely not performance critical, so I chose this approach. I can be compelled to change it, if Reviewers prefer to change the code to use NOT_DEBUG_RETURN . ------------- PR: https://git.openjdk.org/jdk/pull/10591 From aph at openjdk.org Tue Oct 11 09:07:26 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Oct 2022 09:07:26 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v2] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 20:58:33 GMT, Martin Doerr wrote: > > Might it make sense for PPC to do what AArch64 does? We don't need any memory fence instructions on the fast path. > > We are actually doing what AArch64 does, now. AArch64 still uses `membar(__ LoadLoad)` for (non-generational) ZGC and ShenandoahGC, because they use `NMethodPatchingType::conc_data_patch`. Other GCs don't need any memory fence instructions. Ah, okay. On AArch64 we only issue a `LoadLoad` for `conc_data_patch` so I think we're good. ------------- PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Tue Oct 11 09:13:31 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Oct 2022 09:13:31 GMT Subject: RFR: 8294580: frame::interpreter_frame_print_on() crashes if free BasicObjectLock exists in frame In-Reply-To: References: Message-ID: On Thu, 29 Sep 2022 12:49:27 GMT, Richard Reingruber wrote: > Add null check before dereferencing BasicObjectLock::_obj. > BasicObjectLocks are marked as free by setting _obj to null. > > I've done manual testing: > > > ./images/jdk/bin/java -Xlog:continuations=trace -XX:+VerifyContinuations --enable-preview VTSleepAfterUnlock > > > with the test attached to the JBS item. > > Example output: > > > [0.349s][trace][continuations] Interpreted frame (sp=0x000000011d5c6398 unextended sp=0x000000011d5c63b8, fp=0x000000011d5c6420, real_fp=0x000000011d5c6420, pc=0x00007f0ff0199c6a) > [0.349s][trace][continuations] ~return entry points [0x00007f0ff0199820, 0x00007f0ff019a2e8] 2760 bytes > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #0 > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #1 > [0.349s][trace][continuations] - local [0x0000000000000000]; #2 > [0.349s][trace][continuations] - stack [0x0000000000000064]; #1 > [0.349s][trace][continuations] - stack [0x0000000000000000]; #0 > [0.349s][trace][continuations] - obj [null] > [0.349s][trace][continuations] - lock [monitor mark(is_neutral no_hash age=0)] > [0.349s][trace][continuations] - monitor[0x000000011d5c63d8] > [0.349s][trace][continuations] - bcp [0x00007f0fa8400401]; @17 > [0.349s][trace][continuations] - locals [0x000000011d5c6440] > [0.349s][trace][continuations] - method [0x00007f0fa8400430]; virtual void VTSleepAfterUnlock.sleepAfterUnlock() LGTM. Thanks for fixing! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10486 From eosterlund at openjdk.org Tue Oct 11 09:14:25 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 11 Oct 2022 09:14:25 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v2] In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 12:28:35 GMT, Stefan Karlsson wrote: >> When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. >> >> ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. >> >> In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. >> >> I'd like to change the single-generation ZGC to do the same. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Guard verify_not_claimed with ifdef ASSERT Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10591 From tschatzl at openjdk.org Tue Oct 11 09:23:29 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Oct 2022 09:23:29 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v2] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 09:01:32 GMT, Stefan Karlsson wrote: >> src/hotspot/share/classfile/classLoaderDataGraph.hpp line 69: >> >>> 67: static void clear_claimed_marks(); >>> 68: static void clear_claimed_marks(int claim); >>> 69: static void verify_claimed_marks_cleared(int claim); >> >> Is there a reason to not use `NOT_DEBUG_RETURN` for this method like for `ClassLoaderData::verify_not_claimed()`? I would prefer this to be consistent. > > I just think the NOT_DEBUG_RETURN / ASSERT macros pollute the code and makes it ugly to read, so I opted to not do it. This function is absolutely not performance critical, so I chose this approach. > > I can be compelled to change it, if Reviewers prefer to change the code to use NOT_DEBUG_RETURN . I am fine with the change as is, just noting that this change uses it in one place and does not in the other, which seemed inconsistent to me. ------------- PR: https://git.openjdk.org/jdk/pull/10591 From jsjolen at openjdk.org Tue Oct 11 09:28:27 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Oct 2022 09:28:27 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL Message-ID: Hi! This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. ------------- Commit messages: - Outline of conversion to UL Changes: https://git.openjdk.org/jdk/pull/10645/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295060 Stats: 113 lines in 7 files changed: 39 ins; 39 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/10645.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10645/head:pull/10645 PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Tue Oct 11 09:28:28 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 11 Oct 2022 09:28:28 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 09:21:46 GMT, Johan Sj?len wrote: > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. src/hotspot/share/runtime/vframeArray.cpp line 338: > 336: > 337: #ifndef PRODUCT > 338: auto log_deopt = [](int i, intptr_t* addr) { This usage of lambdas is needed to avoid crossing initialization in the switch statement. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From aph at openjdk.org Tue Oct 11 09:32:52 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Oct 2022 09:32:52 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 01:27:23 GMT, Hao Sun wrote: > In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. > > Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. > > > $ java -XX:+PrintBytecodeHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 5004099 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 319124 6.38% dc fast_aload_0 > 313397 6.26% e0 fast_iload > 251436 5.02% b6 invokevirtual > 227428 4.54% 19 aload > 166054 3.32% a7 goto > 159167 3.18% 2b aload_1 > 151803 3.03% de fast_aaccess_0 > 136787 2.73% 1b iload_1 > 124037 2.48% 36 istore > 118791 2.37% 84 iinc > 118121 2.36% 1c iload_2 > 110484 2.21% a2 if_icmpge > > $ java -XX:+PrintBytecodePairHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 4804441 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 77602 1.615% 84 a7 iinc goto > 49749 1.035% 36 e0 istore fast_iload > 48931 1.018% e0 10 fast_iload bipush > 46294 0.964% e0 b6 fast_iload invokevirtual > 42661 0.888% a7 e0 goto fast_iload > 42243 0.879% 3a 19 astore aload > 40138 0.835% 19 b9 aload invokeinterface > 36617 0.762% dc 2b fast_aload_0 aload_1 > 35745 0.744% b7 dc invokespecial fast_aload_0 > 35384 0.736% 19 b6 aload invokevirtual > 35035 0.729% b6 de invokevirtual fast_aaccess_0 > 34667 0.722% dc b6 fast_aload_0 invokevirtual > > > In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. > > Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. > > Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1981: > 1979: > 1980: void TemplateInterpreterGenerator::count_bytecode() { > 1981: Register rscratch3 = r10; Please pass the scratch register to use as an argument to `TemplateInterpreterGenerator::generate_trace_code` src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1987: > 1985: > 1986: void TemplateInterpreterGenerator::histogram_bytecode(Template* t) { > 1987: Register rscratch3 = r10; Please pass the scratch register to use as an argument to `TemplateInterpreterGenerator::histogram_bytecode` src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 2008: > 2006: BytecodePairHistogram::log2_number_of_codes); > 2007: __ stxrw(rscratch2, index, index_addr); > 2008: __ cbnzw(rscratch2, L); // retry to load _index Please add `atomic_ldorrw` to the list of `ATOMIC_OP`s (in macroAssembler_aarch64.cpp) and use it here. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From rrich at openjdk.org Tue Oct 11 09:58:53 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 11 Oct 2022 09:58:53 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v4] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 22:28:02 GMT, Martin Doerr wrote: >> I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Also use 64 bit load for `uint64_t _nmethod_disarm_value` (not relevant for Little Endian). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use 64 bit load for _nmethod_disarm_value. (Makes a difference on Big Endian\!) Changes look good! Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Tue Oct 11 10:14:30 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Oct 2022 10:14:30 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v4] In-Reply-To: References: Message-ID: <8tzBoN0RtyIW9miMSZC2OL7bjYfmnpV7cv27O6nG6aM=.7fb95192-8269-43bc-b100-05fbb5c7952f@github.com> On Mon, 10 Oct 2022 22:28:02 GMT, Martin Doerr wrote: >> I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Also use 64 bit load for `uint64_t _nmethod_disarm_value` (not relevant for Little Endian). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use 64 bit load for _nmethod_disarm_value. (Makes a difference on Big Endian\!) Thanks for reviewing! Note: I have tested the new code on Big Endian linux as well. My latest commit avoids hitting the C++ barrier code unnecessarily often. ------------- PR: https://git.openjdk.org/jdk/pull/10632 From tschatzl at openjdk.org Tue Oct 11 10:43:25 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Oct 2022 10:43:25 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Lgtm. Thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/10287 From chagedorn at openjdk.org Tue Oct 11 11:18:07 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 11 Oct 2022 11:18:07 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Thanks Thomas for your review! ------------- PR: https://git.openjdk.org/jdk/pull/10287 From lucy at openjdk.org Tue Oct 11 13:39:36 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 11 Oct 2022 13:39:36 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v4] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 22:28:02 GMT, Martin Doerr wrote: >> I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Also use 64 bit load for `uint64_t _nmethod_disarm_value` (not relevant for Little Endian). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use 64 bit load for _nmethod_disarm_value. (Makes a difference on Big Endian\!) Changes look good to me. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/10632 From mdoerr at openjdk.org Tue Oct 11 13:39:37 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Oct 2022 13:39:37 GMT Subject: RFR: 8295069: [PPC64] Performance regression after JDK-8290025 [v4] In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 22:28:02 GMT, Martin Doerr wrote: >> I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Also use 64 bit load for `uint64_t _nmethod_disarm_value` (not relevant for Little Endian). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Use 64 bit load for _nmethod_disarm_value. (Makes a difference on Big Endian\!) Thanks for the review! ------------- PR: https://git.openjdk.org/jdk/pull/10632 From stefank at openjdk.org Tue Oct 11 13:48:45 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Oct 2022 13:48:45 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v3] In-Reply-To: References: Message-ID: > When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. > > ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. > > In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. > > I'd like to change the single-generation ZGC to do the same. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8294238_zgc_move_cld_claimed_clear - Guard verify_not_claimed with ifdef ASSERT - 8294238: ZGC: Move CLD claimed mark clearing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10591/files - new: https://git.openjdk.org/jdk/pull/10591/files/d7fdc8e3..2c060ba7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10591&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10591&range=01-02 Stats: 30955 lines in 781 files changed: 17954 ins; 9656 del; 3345 mod Patch: https://git.openjdk.org/jdk/pull/10591.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10591/head:pull/10591 PR: https://git.openjdk.org/jdk/pull/10591 From rriggs at openjdk.org Tue Oct 11 15:41:13 2022 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 11 Oct 2022 15:41:13 GMT Subject: RFR: 8291917: Windows - Improve error messages when the C Runtime Libraries or jvm.dll cannot be loaded [v14] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 01:38:52 GMT, David Holmes wrote: >> I left that in there since it would possibly be a useful utility to have for other JLI code that might need to work with Windows errors in the future > > In that case shouldn't the `JLI_xxx` naming scheme be retained? $0.02, leave it for local use until its needed elsewhere, it is easier to maintain if the scope of use is unambiguous. ------------- PR: https://git.openjdk.org/jdk/pull/9749 From aph at openjdk.org Tue Oct 11 16:54:19 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Oct 2022 16:54:19 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Message-ID: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. ------------- Commit messages: - Whitespace - 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic - 8295159: test cases Changes: https://git.openjdk.org/jdk/pull/10661/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295159 Stats: 193 lines in 5 files changed: 171 ins; 19 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Tue Oct 11 16:54:20 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 11 Oct 2022 16:54:20 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Also: this patch is Linux-only. I'll ask for help from build experts to make the tests GCC-only; it's not clear to me how. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From svkamath at openjdk.org Tue Oct 11 17:03:24 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Tue, 11 Oct 2022 17:03:24 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v13] In-Reply-To: <66be8SJdxPOqmqsQ1YIwS4zM4GwPerypGIf8IbfxhRs=.1d03c94a-f3e5-40ae-999e-bdd5f328170d@github.com> References: <66be8SJdxPOqmqsQ1YIwS4zM4GwPerypGIf8IbfxhRs=.1d03c94a-f3e5-40ae-999e-bdd5f328170d@github.com> Message-ID: On Mon, 10 Oct 2022 21:05:58 GMT, Vladimir Kozlov wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated instruct to use kmovw > > I started new testing. @vnkozlov Thank you for reviewing the patch. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From svkamath at openjdk.org Tue Oct 11 17:08:51 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Tue, 11 Oct 2022 17:08:51 GMT Subject: Integrated: 8289552: Make intrinsic conversions between bit representations of half precision values and floats In-Reply-To: References: Message-ID: On Fri, 5 Aug 2022 16:36:23 GMT, Smita Kamath wrote: > 8289552: Make intrinsic conversions between bit representations of half precision values and floats This pull request has now been integrated. Changeset: 07946aa4 Author: Smita Kamath Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/07946aa49c97c93bd11675a9b0b90d07c83f2a94 Stats: 350 lines in 19 files changed: 339 ins; 5 del; 6 mod 8289552: Make intrinsic conversions between bit representations of half precision values and floats Reviewed-by: kvn, sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/9781 From shade at openjdk.org Tue Oct 11 18:14:17 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Oct 2022 18:14:17 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 10:23:04 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) I have a few questions after porting this to RISC-V... src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 272: > 270: // SharedRuntime::OSR_migration_begin() packs BasicObjectLocks in > 271: // the OSR buffer using 2 word entries: first the lock and then > 272: // the oop. This comment is now irrelevant? src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 432: > 430: if (method()->is_synchronized()) { > 431: monitor_address(0, FrameMap::r0_opr); > 432: __ ldr(r4, Address(r0, BasicObjectLock::obj_offset_in_bytes())); Do we have to use a new register here, or can we just reuse `r0`? src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1886: > 1884: > 1885: __ mov(c_rarg0, obj_reg); > 1886: __ mov(c_rarg1, rthread); Now that you dropped an argument here, you need to do `__ call_VM_leaf` with `2`, not with `3` arguments? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From dcubed at openjdk.org Tue Oct 11 18:52:08 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 11 Oct 2022 18:52:08 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <_1A5xHTrbMO1_G-ctty0HXkdBhoTSPRo_MRMFktm1xY=.aded8e35-ba59-433e-8e41-af3b80ffeca6@github.com> On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. It seems strange to me that the native library part is here: test/hotspot/jtreg/runtime/jni/TestDenormalFloat/libfast-math.c and the two test files are here: test/hotspot/jtreg/compiler/floatingpoint/TestDenormalDouble.java test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java And the two tests don't have "@run main/native"... Maybe I'm missing something about what you're trying to test here. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From mdoerr at openjdk.org Tue Oct 11 19:25:59 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 11 Oct 2022 19:25:59 GMT Subject: Integrated: 8295069: [PPC64] Performance regression after JDK-8290025 In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 14:02:09 GMT, Martin Doerr wrote: > I have a proposal to mitigate the performance regression (see JBS issue) on PPC64 significantly. We get most of the performance back. Please review. Implemented like on aarch64, now. Also use 64 bit load for `uint64_t _nmethod_disarm_value` (not relevant for Little Endian). This pull request has now been integrated. Changeset: 945950d8 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/945950d863ebe984e099d83f967adce71892bb95 Stats: 36 lines in 5 files changed: 20 ins; 1 del; 15 mod 8295069: [PPC64] Performance regression after JDK-8290025 Reviewed-by: rrich, lucy ------------- PR: https://git.openjdk.org/jdk/pull/10632 From lmesnik at openjdk.org Tue Oct 11 19:34:12 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 11 Oct 2022 19:34:12 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. Message-ID: The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/10665/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10665&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294486 Stats: 22719 lines in 219 files changed: 50 ins; 22652 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/10665.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10665/head:pull/10665 PR: https://git.openjdk.org/jdk/pull/10665 From rkennke at openjdk.org Tue Oct 11 19:49:32 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 11 Oct 2022 19:49:32 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v2] In-Reply-To: References: Message-ID: <4G3892Q41Qwlt15Y1dmLWkNUmyIEusWVJH2fdb3K0eM=.5ff1859b-baa1-4d60-866b-8e9747a79180@github.com> > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix number of rt args to complete_monitor_locking_C, remove some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10590/files - new: https://git.openjdk.org/jdk/pull/10590/files/3ed51053..34bed54f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=00-01 Stats: 13 lines in 6 files changed: 0 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Tue Oct 11 20:01:32 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 11 Oct 2022 20:01:32 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/fast-locking' into fast-locking - Re-use r0 in call to unlock_object() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10590/files - new: https://git.openjdk.org/jdk/pull/10590/files/34bed54f..4ccdab8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=01-02 Stats: 7 lines in 3 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Tue Oct 11 20:01:33 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 11 Oct 2022 20:01:33 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 13:25:30 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/fast-locking' into fast-locking >> - Re-use r0 in call to unlock_object() > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 272: > >> 270: // SharedRuntime::OSR_migration_begin() packs BasicObjectLocks in >> 271: // the OSR buffer using 2 word entries: first the lock and then >> 272: // the oop. > > This comment is now irrelevant? Yes, removed it there and in same files in other arches. > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 432: > >> 430: if (method()->is_synchronized()) { >> 431: monitor_address(0, FrameMap::r0_opr); >> 432: __ ldr(r4, Address(r0, BasicObjectLock::obj_offset_in_bytes())); > > Do we have to use a new register here, or can we just reuse `r0`? r0 is used below in call to unlock_object(), but not actually used there. I shuffled it a little and re-use r0 now. > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1886: > >> 1884: >> 1885: __ mov(c_rarg0, obj_reg); >> 1886: __ mov(c_rarg1, rthread); > > Now that you dropped an argument here, you need to do `__ call_VM_leaf` with `2`, not with `3` arguments? Good catch! Yes. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From rehn at openjdk.org Tue Oct 11 20:44:06 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 11 Oct 2022 20:44:06 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 20:01:32 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Re-use r0 in call to unlock_object() Regarding benchmarks, is it possible to get some indication what fast-locking+lillput result will be? FinagleHttp seems to suffer a bit, will Lillput give some/all of that back, or more? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From cjplummer at openjdk.org Tue Oct 11 21:34:05 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 11 Oct 2022 21:34:05 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 19:00:11 GMT, Leonid Mesnik wrote: > The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. Can you update the CR to point to the CR that did the porting? ------------- PR: https://git.openjdk.org/jdk/pull/10665 From lmesnik at openjdk.org Tue Oct 11 21:58:03 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 11 Oct 2022 21:58:03 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. In-Reply-To: References: Message-ID: <8QCz10mxVSBVI-ueu0eGh5xuCIqjMi122N8OdxM_5E4=.4fed85c4-9e53-4f4e-8b16-70bcdb346c34@github.com> On Tue, 11 Oct 2022 19:00:11 GMT, Leonid Mesnik wrote: > The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. The tests were ported as a part of Loom. I added tests which are ported to the bug. ------------- PR: https://git.openjdk.org/jdk/pull/10665 From dholmes at openjdk.org Wed Oct 12 00:19:11 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Oct 2022 00:19:11 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: References: Message-ID: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> On Tue, 11 Oct 2022 09:21:46 GMT, Johan Sj?len wrote: > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. Hi Johan, I have a few comments/suggestions about some of this. Thanks. src/hotspot/share/runtime/deoptimization.cpp line 441: > 439: if (trap_scope->rethrow_exception()) { > 440: #ifndef PRODUCT > 441: log_debug(deoptimization)("Exception to be rethrown in the interpreter for method %s::%s at bci %d", trap_scope->method()->method_holder()->name()->as_C_string(), trap_scope->method()->name()->as_C_string(), trap_scope->bci()); While you are here could you break this up into three lines please. src/hotspot/share/runtime/deoptimization.cpp line 1472: > 1470: { > 1471: LogMessage(deoptimization) lm; > 1472: if (lm.is_debug()) { Should this be trace level to match the fact you also needed Verbose before? src/hotspot/share/runtime/vframe.cpp line 681: > 679: #ifndef PRODUCT > 680: void vframe::print() { > 681: if (WizardMode) print_on(tty); You don't need the `WizardMode` guard here and in `print_on`. UL is supposed replace WizardMode and Verbose so the correct fix would be to elide it from `print_on` and only log using `print_on` when the logging level is logically equivalent to `WizardMode` (as you have done elsewhere). src/hotspot/share/runtime/vframe.hpp line 113: > 111: virtual void print_value() const; > 112: virtual void print(); > 113: void print_on(outputStream* st) const override; Unclear if this should also be virtual. src/hotspot/share/runtime/vframeArray.cpp line 380: > 378: #ifndef PRODUCT > 379: log_debug(deoptimization)("Locals size: %d", locals()->size()); > 380: auto log_it = [](int i, intptr_t* addr) { Again I don't see why this can't just be inline test/hotspot/gtest/logging/test_logStream.cpp line 158: > 156: EXPECT_TRUE(file_contains_substring(TestLogFileName, "ABCD\n")); > 157: } > 158: Inadvertent removal? test/hotspot/jtreg/compiler/uncommontrap/TestDeoptOOM.java line 45: > 43: * -XX:CompileCommand=exclude,compiler.uncommontrap.TestDeoptOOM::m9_1 > 44: * -XX:+UnlockDiagnosticVMOptions > 45: * -XX:+UseZGC -XX:+LogCompilation -XX:+TraceDeoptimization -XX:+Verbose Why isn't this enabling deoptimization logging? ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10645 From dholmes at openjdk.org Wed Oct 12 00:19:12 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Oct 2022 00:19:12 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: References: Message-ID: <-gHZi1XpPZBoUl_RS2ETKpgODYvyiUvz5hi2AQlMoDQ=.d0a31f28-04e4-4ae9-b8f4-76415a154853@github.com> On Tue, 11 Oct 2022 09:22:24 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > src/hotspot/share/runtime/vframeArray.cpp line 338: > >> 336: >> 337: #ifndef PRODUCT >> 338: auto log_deopt = [](int i, intptr_t* addr) { > > This usage of lambdas is needed to avoid crossing initialization in the switch statement. ?? I only see one usage in the switch statement so don't understand why this is not inline as normal logging code would be. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From dholmes at openjdk.org Wed Oct 12 00:26:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Oct 2022 00:26:04 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. test/hotspot/jtreg/runtime/jni/TestDenormalFloat/libfast-math.c line 24: > 22: */ > 23: > 24: A comment as to why this file exists will help future maintainers. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From dholmes at openjdk.org Wed Oct 12 00:41:05 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Oct 2022 00:41:05 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. This appears to be a 10 year old bug in gcc. Have we ever had any issues reported because of this? Inserting a workaround now seems rather late. Any FP-using native code can potentially break Java's FP semantics if it messes with the FPU control world (ref the old Borland compilers). Shouldn't any workaround only be needed for the internals of `System.loadLibrary` as other JDK usages of `dlopen` should know what they are opening and that they are libraries that don't have this problem? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From haosun at openjdk.org Wed Oct 12 02:04:45 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Oct 2022 02:04:45 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 09:28:59 GMT, Andrew Haley wrote: >> In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. >> >> Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. >> >> >> $ java -XX:+PrintBytecodeHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 5004099 executed bytecodes: >> >> absolute relative code name >> ---------------------------------------------------------------------- >> 319124 6.38% dc fast_aload_0 >> 313397 6.26% e0 fast_iload >> 251436 5.02% b6 invokevirtual >> 227428 4.54% 19 aload >> 166054 3.32% a7 goto >> 159167 3.18% 2b aload_1 >> 151803 3.03% de fast_aaccess_0 >> 136787 2.73% 1b iload_1 >> 124037 2.48% 36 istore >> 118791 2.37% 84 iinc >> 118121 2.36% 1c iload_2 >> 110484 2.21% a2 if_icmpge >> >> $ java -XX:+PrintBytecodePairHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 4804441 executed bytecode pairs: >> >> absolute relative codes 1st bytecode 2nd bytecode >> ---------------------------------------------------------------------- >> 77602 1.615% 84 a7 iinc goto >> 49749 1.035% 36 e0 istore fast_iload >> 48931 1.018% e0 10 fast_iload bipush >> 46294 0.964% e0 b6 fast_iload invokevirtual >> 42661 0.888% a7 e0 goto fast_iload >> 42243 0.879% 3a 19 astore aload >> 40138 0.835% 19 b9 aload invokeinterface >> 36617 0.762% dc 2b fast_aload_0 aload_1 >> 35745 0.744% b7 dc invokespecial fast_aload_0 >> 35384 0.736% 19 b6 aload invokevirtual >> 35035 0.729% b6 de invokevirtual fast_aaccess_0 >> 34667 0.722% dc b6 fast_aload_0 invokevirtual >> >> >> In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. >> >> Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. >> >> Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. > > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1981: > >> 1979: >> 1980: void TemplateInterpreterGenerator::count_bytecode() { >> 1981: Register rscratch3 = r10; > > Please pass the scratch register to use as an argument to `TemplateInterpreterGenerator::generate_trace_code` Thanks for your review. But I'm afraid I didn't fully understand it. Why `generate_trace_code` is involved? I guess you mean `count_bytecode()`? But `count_bytecode()` is invoked [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L362), and I don't think it's a proper site to pass arch-specific register `r10` to the general `count_bytecode()`. Please correct me if I misunderstood. Thanks. > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 2008: > >> 2006: BytecodePairHistogram::log2_number_of_codes); >> 2007: __ stxrw(rscratch2, index, index_addr); >> 2008: __ cbnzw(rscratch2, L); // retry to load _index > > Please add `atomic_ldorrw` to the list of `ATOMIC_OP`s (in macroAssembler_aarch64.cpp) and use it here. Agree. Will update. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From eliu at openjdk.org Wed Oct 12 05:51:04 2022 From: eliu at openjdk.org (Eric Liu) Date: Wed, 12 Oct 2022 05:51:04 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong wrote: > "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! AArch64 part looks good to me. ------------- Marked as reviewed by eliu (Committer). PR: https://git.openjdk.org/jdk/pull/10332 From fyang at openjdk.org Wed Oct 12 06:45:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Oct 2022 06:45:03 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 05:32:51 GMT, Xiaolin Zheng wrote: > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > Several details: > 1. The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > 2. I kept two different lambda flavours because I think > > __ la_patchable(t0, RuntimeAddress(dest), [&] (int32_t off) { > __ jalr(x1, t0, off); > }); > > might make programmers overlook the `__ jalr()` inside the lambda. So I made the code with jalr aligning the previous style: > > __ la_patchable(t0, RuntimeAddress(dest), [&] (int32_t off) { > __ jalr(x1, t0, off);}); > > But well maybe it's up to the reviewers' flavor and I am okay with any case. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Personally, I prefer the following style: __ relocate(spec, [&] { int32_t off = 0; la_patchable(t0, RuntimeAddress(entry), off); jalr(x1, t0, off); }); Then the code looks more unified to me. And we don't need to extend a new la_patchable interface. ------------- PR: https://git.openjdk.org/jdk/pull/10643 From haosun at openjdk.org Wed Oct 12 07:50:15 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Oct 2022 07:50:15 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v2] In-Reply-To: References: Message-ID: > In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. > > Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. > > > $ java -XX:+PrintBytecodeHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 5004099 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 319124 6.38% dc fast_aload_0 > 313397 6.26% e0 fast_iload > 251436 5.02% b6 invokevirtual > 227428 4.54% 19 aload > 166054 3.32% a7 goto > 159167 3.18% 2b aload_1 > 151803 3.03% de fast_aaccess_0 > 136787 2.73% 1b iload_1 > 124037 2.48% 36 istore > 118791 2.37% 84 iinc > 118121 2.36% 1c iload_2 > 110484 2.21% a2 if_icmpge > > $ java -XX:+PrintBytecodePairHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 4804441 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 77602 1.615% 84 a7 iinc goto > 49749 1.035% 36 e0 istore fast_iload > 48931 1.018% e0 10 fast_iload bipush > 46294 0.964% e0 b6 fast_iload invokevirtual > 42661 0.888% a7 e0 goto fast_iload > 42243 0.879% 3a 19 astore aload > 40138 0.835% 19 b9 aload invokeinterface > 36617 0.762% dc 2b fast_aload_0 aload_1 > 35745 0.744% b7 dc invokespecial fast_aload_0 > 35384 0.736% 19 b6 aload invokevirtual > 35035 0.729% b6 de invokevirtual fast_aaccess_0 > 34667 0.722% dc b6 fast_aload_0 invokevirtual > > > In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. > > Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. > > Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Introduce atomic_orrw Introduce atomic_orrw() function as suggested by aph. Besides, remove atomic_incw(). It's dead code. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10642/files - new: https://git.openjdk.org/jdk/pull/10642/files/7e8b738a..0db39758 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=00-01 Stats: 52 lines in 3 files changed: 20 ins; 28 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10642.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10642/head:pull/10642 PR: https://git.openjdk.org/jdk/pull/10642 From haosun at openjdk.org Wed Oct 12 07:59:07 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Oct 2022 07:59:07 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v2] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 01:41:11 GMT, Hao Sun wrote: >> src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 2008: >> >>> 2006: BytecodePairHistogram::log2_number_of_codes); >>> 2007: __ stxrw(rscratch2, index, index_addr); >>> 2008: __ cbnzw(rscratch2, L); // retry to load _index >> >> Please add `atomic_ldorrw` to the list of `ATOMIC_OP`s (in macroAssembler_aarch64.cpp) and use it here. > > Agree. Will update. Updated in the latest commit. Please help take another look. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From aph at openjdk.org Wed Oct 12 08:11:07 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 08:11:07 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v2] In-Reply-To: References: Message-ID: <9S_HscTv3N7DDbd6YbWyxhLGZxDsxj-9ADVvxBFVVYs=.b953eaf3-b154-4f60-8a8a-c52f3b20b179@github.com> On Wed, 12 Oct 2022 02:01:29 GMT, Hao Sun wrote: >> src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1981: >> >>> 1979: >>> 1980: void TemplateInterpreterGenerator::count_bytecode() { >>> 1981: Register rscratch3 = r10; >> >> Please pass the scratch register to use as an argument to `TemplateInterpreterGenerator::generate_trace_code` > > Thanks for your review. But I'm afraid I didn't fully understand it. > > Why `generate_trace_code` is involved? I guess you mean `count_bytecode()`? > But `count_bytecode()` is invoked [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L362), and I don't think it's a proper site to pass arch-specific register `r10` to the general `count_bytecode()`. > > Please correct me if I misunderstood. Thanks. Ah, I see what you mean. OK, just delete the name `rscratch3` and pass `r10` to `atomic_addw`. That simplifies the code to good effect. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From xlinzheng at openjdk.org Wed Oct 12 08:20:02 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 12 Oct 2022 08:20:02 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v2] In-Reply-To: References: Message-ID: > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > Several details: > 1. The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > 2. I kept two different lambda flavours because I think > > __ la_patchable(t0, RuntimeAddress(dest), [&] (int32_t off) { > __ jalr(x1, t0, off); > }); > > might make programmers overlook the `__ jalr()` inside the lambda. So I made the code with jalr aligning the previous style: > > __ la_patchable(t0, RuntimeAddress(dest), [&] (int32_t off) { > __ jalr(x1, t0, off);}); > > But well maybe it's up to the reviewers' flavor and I am okay with any case. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Change the style as to comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10643/files - new: https://git.openjdk.org/jdk/pull/10643/files/480cc213..30984760 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=00-01 Stats: 220 lines in 10 files changed: 129 ins; 9 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From xlinzheng at openjdk.org Wed Oct 12 08:25:25 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 12 Oct 2022 08:25:25 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 06:40:04 GMT, Fei Yang wrote: > Personally, I prefer the following style: > > ``` > __ relocate(spec, [&] { > int32_t off = 0; > la_patchable(t0, RuntimeAddress(entry), off); > jalr(x1, t0, off); > }); > ``` > > Then the code looks more unified to me. And we don't need to extend a new la_patchable interface. Done - code expands a little for the `relocate()` inside `la_patchable()` is extracted to their outer callers explicitly. The `la_patchable` only appears ~30 times in the backend so it's controllable. ------------- PR: https://git.openjdk.org/jdk/pull/10643 From jsjolen at openjdk.org Wed Oct 12 08:31:07 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Oct 2022 08:31:07 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: <-gHZi1XpPZBoUl_RS2ETKpgODYvyiUvz5hi2AQlMoDQ=.d0a31f28-04e4-4ae9-b8f4-76415a154853@github.com> References: <-gHZi1XpPZBoUl_RS2ETKpgODYvyiUvz5hi2AQlMoDQ=.d0a31f28-04e4-4ae9-b8f4-76415a154853@github.com> Message-ID: On Wed, 12 Oct 2022 00:04:01 GMT, David Holmes wrote: >> src/hotspot/share/runtime/vframeArray.cpp line 338: >> >>> 336: >>> 337: #ifndef PRODUCT >>> 338: auto log_deopt = [](int i, intptr_t* addr) { >> >> This usage of lambdas is needed to avoid crossing initialization in the switch statement. > > ?? I only see one usage in the switch statement so don't understand why this is not inline as normal logging code would be. I should've said "crosses initialization error", not "crossing." Check out this SO question: https://stackoverflow.com/questions/11578936/getting-a-bunch-of-crosses-initialization-error So `LogTarget(Debug, deoptimization) lt;` is the error here. I *think* that we can inline it if we introduce a surrounding scope, is that preferable to you? So: case T_OBJECT: *addr = value->get_int(T_OBJECT); { // Scope off LogTarget LogTarget(Debug, deoptimization) lt; if (lt.is_enabled()) { LogStream ls(lt); ls.print(" - Reconstructed expression %d (OBJECT): ", i); oop o = cast_to_oop((address)(*addr)); if (o == NULL) { ls.print_cr("NULL"); } else { ResourceMark rm; ls.print_raw_cr(o->klass()->name()->as_C_string()); } } } ------------- PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Wed Oct 12 08:37:13 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 12 Oct 2022 08:37:13 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> References: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> Message-ID: On Wed, 12 Oct 2022 00:01:31 GMT, David Holmes wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > src/hotspot/share/runtime/vframe.hpp line 113: > >> 111: virtual void print_value() const; >> 112: virtual void print(); >> 113: void print_on(outputStream* st) const override; > > Unclear if this should also be virtual. This is virtual, taken from `ResourceObj`. The `override` indicates this, I'm basing this on: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rh-override ------------- PR: https://git.openjdk.org/jdk/pull/10645 From rrich at openjdk.org Wed Oct 12 08:50:02 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 12 Oct 2022 08:50:02 GMT Subject: RFR: 8294580: frame::interpreter_frame_print_on() crashes if free BasicObjectLock exists in frame In-Reply-To: References: Message-ID: On Thu, 29 Sep 2022 12:49:27 GMT, Richard Reingruber wrote: > Add null check before dereferencing BasicObjectLock::_obj. > BasicObjectLocks are marked as free by setting _obj to null. > > I've done manual testing: > > > ./images/jdk/bin/java -Xlog:continuations=trace -XX:+VerifyContinuations --enable-preview VTSleepAfterUnlock > > > with the test attached to the JBS item. > > Example output: > > > [0.349s][trace][continuations] Interpreted frame (sp=0x000000011d5c6398 unextended sp=0x000000011d5c63b8, fp=0x000000011d5c6420, real_fp=0x000000011d5c6420, pc=0x00007f0ff0199c6a) > [0.349s][trace][continuations] ~return entry points [0x00007f0ff0199820, 0x00007f0ff019a2e8] 2760 bytes > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #0 > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #1 > [0.349s][trace][continuations] - local [0x0000000000000000]; #2 > [0.349s][trace][continuations] - stack [0x0000000000000064]; #1 > [0.349s][trace][continuations] - stack [0x0000000000000000]; #0 > [0.349s][trace][continuations] - obj [null] > [0.349s][trace][continuations] - lock [monitor mark(is_neutral no_hash age=0)] > [0.349s][trace][continuations] - monitor[0x000000011d5c63d8] > [0.349s][trace][continuations] - bcp [0x00007f0fa8400401]; @17 > [0.349s][trace][continuations] - locals [0x000000011d5c6440] > [0.349s][trace][continuations] - method [0x00007f0fa8400430]; virtual void VTSleepAfterUnlock.sleepAfterUnlock() Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/10486 From rrich at openjdk.org Wed Oct 12 08:51:32 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 12 Oct 2022 08:51:32 GMT Subject: Integrated: 8294580: frame::interpreter_frame_print_on() crashes if free BasicObjectLock exists in frame In-Reply-To: References: Message-ID: On Thu, 29 Sep 2022 12:49:27 GMT, Richard Reingruber wrote: > Add null check before dereferencing BasicObjectLock::_obj. > BasicObjectLocks are marked as free by setting _obj to null. > > I've done manual testing: > > > ./images/jdk/bin/java -Xlog:continuations=trace -XX:+VerifyContinuations --enable-preview VTSleepAfterUnlock > > > with the test attached to the JBS item. > > Example output: > > > [0.349s][trace][continuations] Interpreted frame (sp=0x000000011d5c6398 unextended sp=0x000000011d5c63b8, fp=0x000000011d5c6420, real_fp=0x000000011d5c6420, pc=0x00007f0ff0199c6a) > [0.349s][trace][continuations] ~return entry points [0x00007f0ff0199820, 0x00007f0ff019a2e8] 2760 bytes > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #0 > [0.349s][trace][continuations] - local [0x000000011d5c3550]; #1 > [0.349s][trace][continuations] - local [0x0000000000000000]; #2 > [0.349s][trace][continuations] - stack [0x0000000000000064]; #1 > [0.349s][trace][continuations] - stack [0x0000000000000000]; #0 > [0.349s][trace][continuations] - obj [null] > [0.349s][trace][continuations] - lock [monitor mark(is_neutral no_hash age=0)] > [0.349s][trace][continuations] - monitor[0x000000011d5c63d8] > [0.349s][trace][continuations] - bcp [0x00007f0fa8400401]; @17 > [0.349s][trace][continuations] - locals [0x000000011d5c6440] > [0.349s][trace][continuations] - method [0x00007f0fa8400430]; virtual void VTSleepAfterUnlock.sleepAfterUnlock() This pull request has now been integrated. Changeset: bdb4ed0f Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/bdb4ed0fb136e9e5391cfa520048de6b7f83067d Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8294580: frame::interpreter_frame_print_on() crashes if free BasicObjectLock exists in frame Reviewed-by: dholmes, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/10486 From xlinzheng at openjdk.org Wed Oct 12 08:52:30 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 12 Oct 2022 08:52:30 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v3] In-Reply-To: References: Message-ID: > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: swap the order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10643/files - new: https://git.openjdk.org/jdk/pull/10643/files/30984760..89ca6607 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From xlinzheng at openjdk.org Wed Oct 12 09:03:04 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 12 Oct 2022 09:03:04 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v4] In-Reply-To: References: Message-ID: <9sUAIUtTAopHVTFlSTctcWU1TEy6X24Wbvh_0OY3Yts=.9d9f6eb9-13e0-410c-97fb-8c65d828f52b@github.com> > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: remove a dummy line, and a simple polish by the way ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10643/files - new: https://git.openjdk.org/jdk/pull/10643/files/89ca6607..edf75994 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From ihse at openjdk.org Wed Oct 12 09:17:42 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 09:17:42 GMT Subject: RFR: 8295198: Update more openjdk.java.net => openjdk.org URLs Message-ID: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> In [JDK-8294618](https://bugs.openjdk.org/browse/JDK-8294618), many of the old references to openjdk.java.net was updated. The test code was intentionally left out of that change, but some other instances were missed, though. This patch will fix those misses (but will still leave test code changes to a separate fix). ------------- Commit messages: - Fix https matching bug - Update java.1 from java.md - Update html version - Update openjdk.java.net -> openjdk.org Changes: https://git.openjdk.org/jdk/pull/10670/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10670&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295198 Stats: 22 lines in 6 files changed: 0 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/10670.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10670/head:pull/10670 PR: https://git.openjdk.org/jdk/pull/10670 From xlinzheng at openjdk.org Wed Oct 12 09:21:54 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 12 Oct 2022 09:21:54 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v5] In-Reply-To: References: Message-ID: <33fbgEcGKSX560jIketZj2R2zR-t9B68NfuaJyI1ffA=.d324fd6a-e0c9-49f1-817a-88e2251b897f@github.com> > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Keep aligning int32_t style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10643/files - new: https://git.openjdk.org/jdk/pull/10643/files/edf75994..83f3598a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=03-04 Stats: 38 lines in 10 files changed: 0 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From ihse at openjdk.org Wed Oct 12 09:23:07 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 09:23:07 GMT Subject: RFR: 8295198: Update more openjdk.java.net => openjdk.org URLs In-Reply-To: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> References: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> Message-ID: On Wed, 12 Oct 2022 09:08:55 GMT, Magnus Ihse Bursie wrote: > In [JDK-8294618](https://bugs.openjdk.org/browse/JDK-8294618), many of the old references to openjdk.java.net was updated. > > The test code was intentionally left out of that change, but some other instances were missed, though. > > This patch will fix those misses (but will still leave test code changes to a separate fix). src/java.base/share/man/java.1 line 4394: > 4392: .PP > 4393: See \f[B]CodeHeap State Analytics (OpenJDK)\f[R] > 4394: [https://bugs.openjdk.org/secure/attachment/75649/JVM_CodeHeap_StateAnalytics_V2.pdf] A reflection is that this does not seem like a good permanent URL to refer to in our specifications; but I'll leave that to someone else to fix. ------------- PR: https://git.openjdk.org/jdk/pull/10670 From ihse at openjdk.org Wed Oct 12 09:52:26 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 09:52:26 GMT Subject: RFR: 8295205: Add jcheck whitespace checking for markdown files Message-ID: Markdown files are basically source code for documentation. It should have the same whitespace checks as all other source code, so we don't get spurious trailing whitespace changes. ------------- Commit messages: - 8295205: Add jcheck whitespace checking for markdown files Changes: https://git.openjdk.org/jdk/pull/10671/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10671&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295205 Stats: 34 lines in 9 files changed: 0 ins; 0 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/10671.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10671/head:pull/10671 PR: https://git.openjdk.org/jdk/pull/10671 From fyang at openjdk.org Wed Oct 12 09:58:05 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 12 Oct 2022 09:58:05 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v5] In-Reply-To: <33fbgEcGKSX560jIketZj2R2zR-t9B68NfuaJyI1ffA=.d324fd6a-e0c9-49f1-817a-88e2251b897f@github.com> References: <33fbgEcGKSX560jIketZj2R2zR-t9B68NfuaJyI1ffA=.d324fd6a-e0c9-49f1-817a-88e2251b897f@github.com> Message-ID: On Wed, 12 Oct 2022 09:21:54 GMT, Xiaolin Zheng wrote: >> This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. >> >> Chaining PR #10421. >> >> 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] >> 2. Performance: conservatively no regressions observed. [3] >> >> The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. >> >> >> Having tested several times hotspot tier1~tier4; Testing another turn on board. >> >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Keep aligning int32_t style Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10643 From shade at openjdk.org Wed Oct 12 10:37:17 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Oct 2022 10:37:17 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v9] In-Reply-To: References: <_O1386bHeagTrVI68UU0GbvMWV8XtiRE3phbjYFZ81A=.b35c61a2-af07-43f8-8fab-cb218059e465@github.com> Message-ID: On Tue, 27 Sep 2022 19:07:04 GMT, Magnus Ihse Bursie wrote: >>> @shipilev I had hoped this PR would trigger warning hunting in Hotspot, but I did not anticipate that it would happen even before it was pushed! ;-) Let me know when you are done fixing individual warnings; there seem to be little point in integrating this until these fixes has started to slow down. >> >> Well, as I dig deeper into this mess, my PRs that fix the warnings stray away from being trivial, and build changes are a small parts of them. The actual "meat" of the patches requires review. So, I am actually thinking we should integrate this PR first, wait a little, collate possible build failures due to missing disabled warnings, adding more affected files in these declarations. It would make the actual warning fix PRs more to the point: they would have to only touch the files where warnings were specifically disabled. (The same goes for JDK-side PRs, I think) > > @shipilev So you want me to integrate this first, and then you follow up with your fixes? Hey @magicus, are you going forward with this PR, or? ------------- PR: https://git.openjdk.org/jdk/pull/10414 From haosun at openjdk.org Wed Oct 12 10:37:18 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Oct 2022 10:37:18 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v3] In-Reply-To: References: Message-ID: <6cvdUIUEwpTzsQLQN25rjEEogFaa6hqBq2CeKv7gl14=.4fc4425b-30f4-4798-b931-5b1213ff59d9@github.com> > In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. > > Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. > > > $ java -XX:+PrintBytecodeHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 5004099 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 319124 6.38% dc fast_aload_0 > 313397 6.26% e0 fast_iload > 251436 5.02% b6 invokevirtual > 227428 4.54% 19 aload > 166054 3.32% a7 goto > 159167 3.18% 2b aload_1 > 151803 3.03% de fast_aaccess_0 > 136787 2.73% 1b iload_1 > 124037 2.48% 36 istore > 118791 2.37% 84 iinc > 118121 2.36% 1c iload_2 > 110484 2.21% a2 if_icmpge > > $ java -XX:+PrintBytecodePairHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 4804441 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 77602 1.615% 84 a7 iinc goto > 49749 1.035% 36 e0 istore fast_iload > 48931 1.018% e0 10 fast_iload bipush > 46294 0.964% e0 b6 fast_iload invokevirtual > 42661 0.888% a7 e0 goto fast_iload > 42243 0.879% 3a 19 astore aload > 40138 0.835% 19 b9 aload invokeinterface > 36617 0.762% dc 2b fast_aload_0 aload_1 > 35745 0.744% b7 dc invokespecial fast_aload_0 > 35384 0.736% 19 b6 aload invokevirtual > 35035 0.729% b6 de invokevirtual fast_aaccess_0 > 34667 0.722% dc b6 fast_aload_0 invokevirtual > > > In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. > > Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. > > Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Remove rscratch3 for count_bytecode() and histogram_bytecode() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10642/files - new: https://git.openjdk.org/jdk/pull/10642/files/0db39758..bbbc3020 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10642.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10642/head:pull/10642 PR: https://git.openjdk.org/jdk/pull/10642 From haosun at openjdk.org Wed Oct 12 10:37:18 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Oct 2022 10:37:18 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v3] In-Reply-To: <9S_HscTv3N7DDbd6YbWyxhLGZxDsxj-9ADVvxBFVVYs=.b953eaf3-b154-4f60-8a8a-c52f3b20b179@github.com> References: <9S_HscTv3N7DDbd6YbWyxhLGZxDsxj-9ADVvxBFVVYs=.b953eaf3-b154-4f60-8a8a-c52f3b20b179@github.com> Message-ID: On Wed, 12 Oct 2022 08:08:38 GMT, Andrew Haley wrote: >> Thanks for your review. But I'm afraid I didn't fully understand it. >> >> Why `generate_trace_code` is involved? I guess you mean `count_bytecode()`? >> But `count_bytecode()` is invoked [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L362), and I don't think it's a proper site to pass arch-specific register `r10` to the general `count_bytecode()`. >> >> Please correct me if I misunderstood. Thanks. > > Ah, I see what you mean. > OK, just delete the name `rscratch3` and pass `r10` to `atomic_addw`. That simplifies the code to good effect. Yes. Agree. Updated. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From ihse at openjdk.org Wed Oct 12 10:42:12 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 10:42:12 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v17] In-Reply-To: References: Message-ID: On Wed, 28 Sep 2022 19:52:27 GMT, Magnus Ihse Bursie wrote: >> After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. >> >> Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. >> >> Some warnings didn't trigger in any file anymore, and could just be removed. >> >> Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. >> >> I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. >> >> I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. >> >> It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). >> >> Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Revert jvm variants hack Yes, that is my intention. I've been away for almost two weeks, taking care of my kids and then getting ill myself. :-( I'm wading through my backlog, and intend to get back to this very soon. **Note to self** Remaining to be done: - [ ] Fix Windows warnings according to Kim's list - [ ] Check JvmOverrideFiles.gmk according to Aleksey's suggestion ------------- PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 10:45:26 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 10:45:26 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v18] In-Reply-To: References: Message-ID: > After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. > > Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. > > Some warnings didn't trigger in any file anymore, and could just be removed. > > Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. > > I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. > > I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. > > It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). > > Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: - Merge branch 'master' into hotspot-warnings-per-file - Revert jvm variants hack - Revert "TESTING GHA v2: Previous attempt was botched" This reverts commit 4a51a15cd73ebc6aa96ad04f9b1eb88e2683c82f. - Add missing sub workflow inputs - Pass additional arguments to GHA runs - TESTING GHA v2: Previous attempt was botched - TESTING: enable all variants - add address for cgroupV1Subsystem_linux.cpp (thanks @shqking) Co-authored-by: Hao Sun - Add misleading-indentation for client (thanks @shqking) Co-authored-by: Hao Sun - Don't need empty-body for ad_ppc.cpp now that it is globally disabled - ... and 19 more: https://git.openjdk.org/jdk/compare/21647a39...f4f65396 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10414/files - new: https://git.openjdk.org/jdk/pull/10414/files/d6c52a2a..f4f65396 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=16-17 Stats: 27023 lines in 667 files changed: 16269 ins; 7621 del; 3133 mod Patch: https://git.openjdk.org/jdk/pull/10414.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10414/head:pull/10414 PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 10:48:12 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 10:48:12 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v19] In-Reply-To: References: Message-ID: > After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. > > Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. > > Some warnings didn't trigger in any file anymore, and could just be removed. > > Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. > > I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. > > I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. > > It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). > > Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Restore msvc 4127 - conditional expression is constant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10414/files - new: https://git.openjdk.org/jdk/pull/10414/files/f4f65396..f952fecd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10414.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10414/head:pull/10414 PR: https://git.openjdk.org/jdk/pull/10414 From haosun at openjdk.org Wed Oct 12 10:50:10 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 12 Oct 2022 10:50:10 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension In-Reply-To: References: Message-ID: On Fri, 23 Sep 2022 11:13:40 GMT, Bhavana Kilambi wrote: > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. test/hotspot/jtreg/compiler/vectorization/TestEor3AArch64.java line 38: > 36: * @summary Test EOR3 Neon/SVE2 instruction for aarch64 SHA3 extension > 37: * @library /test/lib / > 38: * @requires os.arch == "aarch64" & vm.cpu.features ~=".*sha3.*" Suggestion: * @requires os.arch == "aarch64" & vm.cpu.features ~= ".*sha3.*" nit: [style] it's better to have one extra space. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From ihse at openjdk.org Wed Oct 12 10:55:07 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 10:55:07 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v20] In-Reply-To: References: Message-ID: > After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. > > Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. > > Some warnings didn't trigger in any file anymore, and could just be removed. > > Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. > > I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. > > I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. > > It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). > > Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Move disabled warnings from JvmOverrideFiles.gmk to the new per-file disabling framework (maybe-uninitialized is always turned off so skip those files) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10414/files - new: https://git.openjdk.org/jdk/pull/10414/files/f952fecd..8b4433e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=18-19 Stats: 10 lines in 2 files changed: 4 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10414.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10414/head:pull/10414 PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 11:04:14 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 11:04:14 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: > After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. > > Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. > > Some warnings didn't trigger in any file anymore, and could just be removed. > > Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. > > I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. > > I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. > > It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). > > Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Github workflow changes were not supposed to be in this PR... ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10414/files - new: https://git.openjdk.org/jdk/pull/10414/files/8b4433e2..64c86d46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10414&range=19-20 Stats: 60 lines in 5 files changed: 0 ins; 52 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10414.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10414/head:pull/10414 PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 11:04:14 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 11:04:14 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v9] In-Reply-To: References: <_O1386bHeagTrVI68UU0GbvMWV8XtiRE3phbjYFZ81A=.b35c61a2-af07-43f8-8fab-cb218059e465@github.com> Message-ID: On Wed, 12 Oct 2022 10:35:08 GMT, Aleksey Shipilev wrote: >> @shipilev So you want me to integrate this first, and then you follow up with your fixes? > > Hey @magicus, are you going forward with this PR, or? @shipilev In a "tenth time's the charm" spirit, here's what I do think is actually a PR that can be integrated. (This one was really messy, both due to me being a bit sloppy and too trigger-happy at times, and due to the complex situation of changing warnings across multiple combinations of variants, compilers, platforms, etc.) ------------- PR: https://git.openjdk.org/jdk/pull/10414 From shade at openjdk.org Wed Oct 12 11:11:07 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Oct 2022 11:11:07 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v9] In-Reply-To: References: <_O1386bHeagTrVI68UU0GbvMWV8XtiRE3phbjYFZ81A=.b35c61a2-af07-43f8-8fab-cb218059e465@github.com> Message-ID: <1JwBm4xpt9EfiFBlkharONlS8E3RjMDXq2sCtQyu0UQ=.efc05433-4767-4863-bfbc-b5cf6aba672c@github.com> On Wed, 12 Oct 2022 10:35:08 GMT, Aleksey Shipilev wrote: >> @shipilev So you want me to integrate this first, and then you follow up with your fixes? > > Hey @magicus, are you going forward with this PR, or? > @shipilev In a "tenth time's the charm" spirit, here's what I do think is actually a PR that can be integrated. Cool! I'll try to schedule the overnight build-matrix run to see if anything is broken. ------------- PR: https://git.openjdk.org/jdk/pull/10414 From shade at openjdk.org Wed Oct 12 11:30:07 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Oct 2022 11:30:07 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 20:01:32 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Re-use r0 in call to unlock_object() Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From ihse at openjdk.org Wed Oct 12 11:31:26 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 11:31:26 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. I think this looks okay from a build perspective. I second David's opinion that the empty file should have a comment explaining that it exists to be compiled with a special flag. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.org/jdk/pull/10661 From ihse at openjdk.org Wed Oct 12 11:31:27 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 11:31:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:50:33 GMT, Andrew Haley wrote: > Also: this patch is Linux-only. I'll ask for help from build experts to make the tests GCC-only; it's not clear to me how. @theRealAph We currently have a more or less 1-to-1 relationship between OS and compiler. From a portability perspective this is not ideal, but it is also hard to keep some kind of theoretical barrier when it never is tested. Are you worried that this is a problem that occurs with gcc on other platforms as well, or that it occurs on linux with other compilers than gcc? We do not support gcc on macos, windows or aix, which leaves linux as the only gcc platform in the mainline (ports in separate repos will have to handle this themselves). We do support using clang on linux instead of gcc (but do not test it regularly). But my impression here is that this is more of a gcc-problem than a linux problem..? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From ihse at openjdk.org Wed Oct 12 11:50:06 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 11:50:06 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed We currently require clang 3.5 (when building on linux I presume; Xcode has their very own peculiar way of versioning clang). Maybe this is too old? Should we instead raise the bar to require clang 5.0, so we can skip the test? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From ihse at openjdk.org Wed Oct 12 11:54:13 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 11:54:13 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Yeah, clang 5.0 was released in 2017. I'd recommend you just hard-code in the dwarf flags and instead apply this: diff --git a/make/autoconf/toolchain.m4 b/make/autoconf/toolchain.m4 index adb4e182dcc..e1c097916d8 100644 --- a/make/autoconf/toolchain.m4 +++ b/make/autoconf/toolchain.m4 @@ -50,7 +50,7 @@ TOOLCHAIN_DESCRIPTION_microsoft="Microsoft Visual Studio" TOOLCHAIN_DESCRIPTION_xlc="IBM XL C/C++" # Minimum supported versions, empty means unspecified -TOOLCHAIN_MINIMUM_VERSION_clang="3.5" +TOOLCHAIN_MINIMUM_VERSION_clang="5.0" TOOLCHAIN_MINIMUM_VERSION_gcc="6.0" TOOLCHAIN_MINIMUM_VERSION_microsoft="19.28.0.0" # VS2019 16.8, aka MSVC 14.28 TOOLCHAIN_MINIMUM_VERSION_xlc="" ``` ------------- PR: https://git.openjdk.org/jdk/pull/10287 From aph at openjdk.org Wed Oct 12 11:58:30 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 11:58:30 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v2] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/af243d47..8d834739 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=00-01 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From qamai at openjdk.org Wed Oct 12 11:58:47 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Oct 2022 11:58:47 GMT Subject: RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v12] In-Reply-To: References: Message-ID: <1qzngp8Z8spVxoU3C8PxQgqkCJFw3anZqp8_mn8qI2s=.2db33f71-30cf-4365-9ba6-d05146fc8771@github.com> > This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask::alltrue` is compiled into machine codes: > > vptest xmm0, xmm1 > jb if_true > if_false: > > instead of: > > vptest xmm0, xmm1 > setb r10 > movzbl r10 > testl r10 > jne if_true > if_false: > > The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements: > > Before After > Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change > ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ? 8316.444 222279.381 ? 2660.983 ops/ms +2.3% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ? 1618.836 116268.691 ? 1291.899 ops/ms +2.1% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ? 72.862 797.806 ? 16.429 ops/ms +13.6% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ? 2401.258 145338.910 ? 687.453 ops/ms -0.5% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ? 1259.397 69041.519 ? 1073.156 ops/ms +13.9% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ? 10.975 408.770 ? 5.281 ops/ms +29.0% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ? 1200.166 188482.433 ? 1872.076 ops/ms -3.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ? 473.013 42293.411 ? 2838.255 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ? 5.410 67.628 ? 3.241 ops/ms -0.8% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ? 1677.607 111060.400 ? 982.230 ops/ms +3.1% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ? 1002.599 21440.506 ? 1618.266 ops/ms +28.4% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ? 0.548 33.202 ? 2.365 ops/ms +0.7% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ? 3154.842 379944.254 ? 5703.134 ops/ms +13.3% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ? 786.312 56721.368 ? 2497.052 ops/ms -3.0% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ? 11.415 139.537 ? 4.667 ops/ms +4.9% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ? 2281.349 112409.365 ? 2110.055 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ? 795.619 33756.613 ? 826.533 ops/ms +24.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ? 8.927 66.951 ? 4.381 ops/ms +16.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ? 1042.497 182438.405 ? 2120.832 ops/ms -0.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ? 614.821 35397.398 ? 1609.235 ops/ms -3.5% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ? 2.142 65.427 ? 2.270 ops/ms -1.5% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ? 497.853 115165.845 ? 5381.674 ops/ms +4.3% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ? 661.350 19871.096 ? 201.464 ops/ms +35.0% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ? 0.821 31.933 ? 1.352 ops/ms +3.8% > > I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'master' into improveVTest - redundant casts - remove untaken code paths - Merge branch 'master' into improveVTest - Merge branch 'master' into improveVTest - Merge branch 'master' into improveVTest - fix merge problems - Merge branch 'master' into improveVTest - refactor x86 - revert renaming temp - ... and 19 more: https://git.openjdk.org/jdk/compare/86ec158d...05c1b9f5 ------------- Changes: https://git.openjdk.org/jdk/pull/9855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=11 Stats: 492 lines in 23 files changed: 212 ins; 171 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/9855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855 PR: https://git.openjdk.org/jdk/pull/9855 From stefank at openjdk.org Wed Oct 12 12:06:08 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 12 Oct 2022 12:06:08 GMT Subject: RFR: 8294238: ZGC: Move CLD claimed mark clearing [v3] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 13:48:45 GMT, Stefan Karlsson wrote: >> When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. >> >> ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. >> >> In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. >> >> I'd like to change the single-generation ZGC to do the same. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8294238_zgc_move_cld_claimed_clear > - Guard verify_not_claimed with ifdef ASSERT > - 8294238: ZGC: Move CLD claimed mark clearing Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/10591 From stefank at openjdk.org Wed Oct 12 12:09:52 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 12 Oct 2022 12:09:52 GMT Subject: Integrated: 8294238: ZGC: Move CLD claimed mark clearing In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 11:20:40 GMT, Stefan Karlsson wrote: > When we claim CLDs during object iteration, we must make sure to have a cleared set of claim bits. Today we ensure this by clearing the bits before object iteration starts. Most GCs perform this clearing during a stop-the-world pause, before the actual GC marking starts. > > ZGC, however, performs the clearing concurrently. This requires us to be very careful and never start following object references before the clearing has completed. > > In the Generational ZGC repository, we changed it so that the code that performs the object iteration cleans up and clears these bits after itself. This has the effect that when marking starts, we know that the claimed bits have been cleared. > > I'd like to change the single-generation ZGC to do the same. This pull request has now been integrated. Changeset: 9cf66512 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/9cf665120291ece49c02bf490bc95ac57fbb5af4 Stats: 30 lines in 7 files changed: 28 ins; 0 del; 2 mod 8294238: ZGC: Move CLD claimed mark clearing Reviewed-by: coleenp, tschatzl, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/10591 From erikj at openjdk.org Wed Oct 12 13:17:04 2022 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 12 Oct 2022 13:17:04 GMT Subject: RFR: 8295198: Update more openjdk.java.net => openjdk.org URLs In-Reply-To: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> References: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> Message-ID: On Wed, 12 Oct 2022 09:08:55 GMT, Magnus Ihse Bursie wrote: > In [JDK-8294618](https://bugs.openjdk.org/browse/JDK-8294618), many of the old references to openjdk.java.net was updated. > > The test code was intentionally left out of that change, but some other instances were missed, though. > > This patch will fix those misses (but will still leave test code changes to a separate fix). Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10670 From erikj at openjdk.org Wed Oct 12 13:19:22 2022 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 12 Oct 2022 13:19:22 GMT Subject: RFR: 8295205: Add jcheck whitespace checking for markdown files In-Reply-To: References: Message-ID: <2owj2BREdvPlEo70s1rAvPywZeUPcimsej24fiAHyB4=.26f3a76b-0765-49ea-882d-591748bf1dcf@github.com> On Wed, 12 Oct 2022 09:44:54 GMT, Magnus Ihse Bursie wrote: > Markdown files are basically source code for documentation. It should have the same whitespace checks as all other source code, so we don't get spurious trailing whitespace changes. Thank you! Since I enabled visible whitespace in emacs (to be able to properly edit makefiles) editing any file where trailing whitespace isn't enforced makes my eyes bleed. :) ------------- Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.org/jdk/pull/10671 From erikj at openjdk.org Wed Oct 12 13:22:07 2022 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 12 Oct 2022 13:22:07 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 11:04:14 GMT, Magnus Ihse Bursie wrote: >> After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. >> >> Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. >> >> Some warnings didn't trigger in any file anymore, and could just be removed. >> >> Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. >> >> I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. >> >> I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. >> >> It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). >> >> Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Github workflow changes were not supposed to be in this PR... Marked as reviewed by erikj (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 13:22:08 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 13:22:08 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 11:04:14 GMT, Magnus Ihse Bursie wrote: >> After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. >> >> Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. >> >> Some warnings didn't trigger in any file anymore, and could just be removed. >> >> Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. >> >> I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. >> >> I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. >> >> It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). >> >> Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Github workflow changes were not supposed to be in this PR... Thanks. Maybe you can make some coffee of that excess 1 kWh heat? ;) ------------- PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 13:34:32 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 13:34:32 GMT Subject: Integrated: 8295205: Add jcheck whitespace checking for markdown files In-Reply-To: References: Message-ID: <4JUC33BHEbQxyvmLXseCLGOGURYFR_6f0uHcyJaGUkc=.cb1528a9-2718-4bfc-9a70-dce6a9e4db00@github.com> On Wed, 12 Oct 2022 09:44:54 GMT, Magnus Ihse Bursie wrote: > Markdown files are basically source code for documentation. It should have the same whitespace checks as all other source code, so we don't get spurious trailing whitespace changes. This pull request has now been integrated. Changeset: 86078423 Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/860784238ea1f3e4a817fc3c28fb89cfee7549dd Stats: 34 lines in 9 files changed: 0 ins; 0 del; 34 mod 8295205: Add jcheck whitespace checking for markdown files Reviewed-by: erikj ------------- PR: https://git.openjdk.org/jdk/pull/10671 From jwaters at openjdk.org Wed Oct 12 13:36:36 2022 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 12 Oct 2022 13:36:36 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v4] In-Reply-To: References: Message-ID: > The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'openjdk:master' into patch-1 - Comment documenting change isn't required - Merge branch 'openjdk:master' into patch-1 - Comment formatting - Remove Windows specific JLI_Snprintf implementation - Remove Windows JLI_Snprintf definition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10625/files - new: https://git.openjdk.org/jdk/pull/10625/files/9149aae1..8ac9b519 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=02-03 Stats: 7257 lines in 143 files changed: 5279 ins; 912 del; 1066 mod Patch: https://git.openjdk.org/jdk/pull/10625.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10625/head:pull/10625 PR: https://git.openjdk.org/jdk/pull/10625 From kbarrett at openjdk.org Wed Oct 12 13:37:13 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 12 Oct 2022 13:37:13 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 11:04:14 GMT, Magnus Ihse Bursie wrote: >> After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. >> >> Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. >> >> Some warnings didn't trigger in any file anymore, and could just be removed. >> >> Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. >> >> I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. >> >> I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. >> >> It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). >> >> Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Github workflow changes were not supposed to be in this PR... make/hotspot/lib/CompileJvm.gmk line 92: > 90: > 91: DISABLED_WARNINGS_clang := ignored-qualifiers sometimes-uninitialized \ > 92: missing-braces delete-non-abstract-non-virtual-dtor unknown-pragmas Shouldn't shift-negative-value be in the clang list too? ------------- PR: https://git.openjdk.org/jdk/pull/10414 From ihse at openjdk.org Wed Oct 12 13:38:22 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 13:38:22 GMT Subject: Integrated: 8295198: Update more openjdk.java.net => openjdk.org URLs In-Reply-To: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> References: <7EaJFlv0rkHlPIx439wEahllhClGQMg4XOjQAeEghT8=.9fa317de-73e5-4e90-a686-17e530cbda98@github.com> Message-ID: On Wed, 12 Oct 2022 09:08:55 GMT, Magnus Ihse Bursie wrote: > In [JDK-8294618](https://bugs.openjdk.org/browse/JDK-8294618), many of the old references to openjdk.java.net was updated. > > The test code was intentionally left out of that change, but some other instances were missed, though. > > This patch will fix those misses (but will still leave test code changes to a separate fix). This pull request has now been integrated. Changeset: 84022605 Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/8402260535eae0fb8bca2327372d03e33cc2add9 Stats: 15 lines in 6 files changed: 0 ins; 1 del; 14 mod 8295198: Update more openjdk.java.net => openjdk.org URLs Reviewed-by: erikj ------------- PR: https://git.openjdk.org/jdk/pull/10670 From ihse at openjdk.org Wed Oct 12 13:45:14 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 13:45:14 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 13:33:26 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Github workflow changes were not supposed to be in this PR... > > make/hotspot/lib/CompileJvm.gmk line 92: > >> 90: >> 91: DISABLED_WARNINGS_clang := ignored-qualifiers sometimes-uninitialized \ >> 92: missing-braces delete-non-abstract-non-virtual-dtor unknown-pragmas > > Shouldn't shift-negative-value be in the clang list too? Well, there is currently no instance of clang complaining about this. This could be due to: * This warning does not really exist on clang * Or it is not enabled by our current clang flags * Or the code which triggers the warning in gcc is not reached by clang * Or clang is smarter than gcc and can determine that the usage is ok after all * Or clang is dumber than gcc and does not even see that there could have been a problem... ... I'm kind of reluctant to add warnings to this list that have not occurred for real. My suggestion is that we add it here if we ever see it making incorrect claims. Ok? ------------- PR: https://git.openjdk.org/jdk/pull/10414 From aph at openjdk.org Wed Oct 12 13:52:03 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 13:52:03 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v3] In-Reply-To: <6cvdUIUEwpTzsQLQN25rjEEogFaa6hqBq2CeKv7gl14=.4fc4425b-30f4-4798-b931-5b1213ff59d9@github.com> References: <6cvdUIUEwpTzsQLQN25rjEEogFaa6hqBq2CeKv7gl14=.4fc4425b-30f4-4798-b931-5b1213ff59d9@github.com> Message-ID: On Wed, 12 Oct 2022 10:37:18 GMT, Hao Sun wrote: >> In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. >> >> Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. >> >> >> $ java -XX:+PrintBytecodeHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 5004099 executed bytecodes: >> >> absolute relative code name >> ---------------------------------------------------------------------- >> 319124 6.38% dc fast_aload_0 >> 313397 6.26% e0 fast_iload >> 251436 5.02% b6 invokevirtual >> 227428 4.54% 19 aload >> 166054 3.32% a7 goto >> 159167 3.18% 2b aload_1 >> 151803 3.03% de fast_aaccess_0 >> 136787 2.73% 1b iload_1 >> 124037 2.48% 36 istore >> 118791 2.37% 84 iinc >> 118121 2.36% 1c iload_2 >> 110484 2.21% a2 if_icmpge >> >> $ java -XX:+PrintBytecodePairHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 4804441 executed bytecode pairs: >> >> absolute relative codes 1st bytecode 2nd bytecode >> ---------------------------------------------------------------------- >> 77602 1.615% 84 a7 iinc goto >> 49749 1.035% 36 e0 istore fast_iload >> 48931 1.018% e0 10 fast_iload bipush >> 46294 0.964% e0 b6 fast_iload invokevirtual >> 42661 0.888% a7 e0 goto fast_iload >> 42243 0.879% 3a 19 astore aload >> 40138 0.835% 19 b9 aload invokeinterface >> 36617 0.762% dc 2b fast_aload_0 aload_1 >> 35745 0.744% b7 dc invokespecial fast_aload_0 >> 35384 0.736% 19 b6 aload invokevirtual >> 35035 0.729% b6 de invokevirtual fast_aaccess_0 >> 34667 0.722% dc b6 fast_aload_0 invokevirtual >> >> >> In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. >> >> Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. >> >> Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Remove rscratch3 for count_bytecode() and histogram_bytecode() src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 2004: > 2002: /* kind */ Assembler::LSR, > 2003: /* shift */ BytecodePairHistogram::log2_number_of_codes); > 2004: I've had another look at this. `_index` is a two-element queue of bytecodes, but it is shared between all threads. If two threads access `_index` racily the result will be invalid, regardless of whether the OR into memory is atomic. This will make this PR much simpler. None of the other ports access `_index` atomically, and should need we. On the other hand, bumping the bytecode counter atomically is fine. We could make a thread-local `_index`, but it's too much code to be worthwhile. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From aph at openjdk.org Wed Oct 12 13:54:05 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 13:54:05 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <_1A5xHTrbMO1_G-ctty0HXkdBhoTSPRo_MRMFktm1xY=.aded8e35-ba59-433e-8e41-af3b80ffeca6@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <_1A5xHTrbMO1_G-ctty0HXkdBhoTSPRo_MRMFktm1xY=.aded8e35-ba59-433e-8e41-af3b80ffeca6@github.com> Message-ID: On Tue, 11 Oct 2022 18:45:56 GMT, Daniel D. Daugherty wrote: > It seems strange to me that the native library part is here: > > test/hotspot/jtreg/runtime/jni/TestDenormalFloat/libfast-math.c > > and the two test files are here: > > test/hotspot/jtreg/compiler/floatingpoint/TestDenormalDouble.java test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java I already moved it. > And the two tests don't have "@run main/native"... Maybe I'm missing something about what you're trying to test here. Will fix. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From chagedorn at openjdk.org Wed Oct 12 13:56:12 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 12 Oct 2022 13:56:12 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: <7OZU4WgKLkiepkWFzN_VP4zsHRy9EkbmOFtco2AAhPw=.35345d5f-100d-474a-a3cc-dcab48bc1219@github.com> On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed That would be an option to bump that minimum version to get rid of the `ifdefs` for clang 5.0. But since this has quite an impact, are we allowed to just do that in this change or is it required to have a separate task for that with approvals first? We would also need to change the docs to reflect that change etc. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From aph at openjdk.org Wed Oct 12 14:00:31 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:00:31 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 00:38:39 GMT, David Holmes wrote: > This appears to be a 10 year old bug in gcc. Have we ever had any issues reported because of this? That's what is so insidious about messing with the floating-point control word: all that happens is people get slightly inaccurate results, which they don't notice. > Inserting a workaround now seems rather late. Any FP-using native code can potentially break Java's FP semantics if it messes with the FPU control word (ref the old Borland compilers). Indeed, but there seem to be a number of libraries out there compiled with -ffast-math. This is the kind of thing people do with highly performance-critical stuff like ray tracing, etc. But we don't want it to break Java. > Shouldn't any workaround only be needed for the internals of `System.loadLibrary` as other JDK usages of `dlopen` should know what they are opening and that they are libraries that don't have this problem? Maybe, but sometimes we use system- or user-provided librries. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 14:00:31 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:00:31 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <0KP9pSAaKoDlHfA-yBHi1T4LXnIcPNIM0WnnYo1pwG8=.813aeb96-c059-4cb4-a10a-688e4e00ecf4@github.com> On Wed, 12 Oct 2022 11:28:08 GMT, Magnus Ihse Bursie wrote: > > Also: this patch is Linux-only. I'll ask for help from build experts to make the tests GCC-only; it's not clear to me how. > > @theRealAph We currently have a more or less 1-to-1 relationship between OS and compiler. From a portability perspective this is not ideal, but it is also hard to keep some kind of theoretical barrier when it never is tested. I see. > Are you worried that this is a problem that occurs with gcc on other platforms as well, or that it occurs on linux with other compilers than gcc? The former. > We do not support gcc on macos, windows or aix, which leaves linux as the only gcc platform in the mainline (ports in separate repos will have to handle this themselves). We do support using clang on linux instead of gcc (but do not test it regularly). But my impression here is that this is more of a gcc-problem than a linux problem..? Probably, yes. I'm happy just to fix it on Linux. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 14:11:46 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:11:46 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v3] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/8d834739..d5655d4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 14:11:48 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:11:48 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <0KP9pSAaKoDlHfA-yBHi1T4LXnIcPNIM0WnnYo1pwG8=.813aeb96-c059-4cb4-a10a-688e4e00ecf4@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <0KP9pSAaKoDlHfA-yBHi1T4LXnIcPNIM0WnnYo1pwG8=.813aeb96-c059-4cb4-a10a-688e4e00ecf4@github.com> Message-ID: On Wed, 12 Oct 2022 13:58:06 GMT, Andrew Haley wrote: > We do not support gcc on macos, windows or aix, which leaves linux as the only gcc platform in the mainline (ports in separate repos will have to handle this themselves). We do support using clang on linux instead of gcc (but do not test it regularly). But my impression here is that this is more of a gcc-problem than a linux problem..? Indeed. The problem is that we have no idea what compiler was used to compile the libraries we call, so I'd like to be conservative. The increase in runtime due to saving and restoring the floating-point environment is very low, given that `dlopen()` is a pretty expensive operation. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From ihse at openjdk.org Wed Oct 12 14:22:07 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 14:22:07 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v3] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 14:11:46 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic So you want to provide this dlopen wrapper for other platforms as well? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From ihse at openjdk.org Wed Oct 12 14:30:07 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 12 Oct 2022 14:30:07 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed If is is a big change or not depends on how it affects builds on macos. I assume that clang functionality that was published in 2017 is already incorporated in the minimum supported version of Xcode on mac. (This needs to be verified, though) For clang on linux; Oracle do not regularly build linux with clang, nor do our GHA build scripts. I don't know if anyone regularly tests this. My guess is that the 3.5 limit was put in place when the clang for linux support was added, and it has not been modified since. But you are probably right that it should be a separate PR, if not for anything else so to get proper attention to it. Do you want me to publish such a PR first, or do you want to continue with the current conditionals, and clean them up afterwards if/when we go to clang 5.0+? ------------- PR: https://git.openjdk.org/jdk/pull/10287 From aph at openjdk.org Wed Oct 12 14:33:18 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:33:18 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v4] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/d5655d4e..fe388b8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 14:37:37 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:37:37 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v5] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/fe388b8b..bcff4597 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=03-04 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 14:43:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 14:43:10 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v3] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 14:19:53 GMT, Magnus Ihse Bursie wrote: > So you want to provide this dlopen wrapper for other platforms as well? I'm not sure. It sounds to me like a Linux-only patch would do it, but I think BSD uses GCC sometimes. (Open/FreeBSD is downstream of OpenJDK, and I'd love to get it into mainline, but getting BSD contributors to sign OCA is hard.) ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 15:49:07 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 15:49:07 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <5hqEYIcXwutqjcM6g3AAeXGme5OpWZ9ooduEUiK3dp8=.457e808f-2c15-4b35-84e7-f6a24f27a858@github.com> On Wed, 12 Oct 2022 13:56:56 GMT, Andrew Haley wrote: > > This appears to be a 10 year old bug in gcc. Have we ever had any issues reported because of this? > > That's what is so insidious about messing with the floating-point control word: all that happens is people get slightly inaccurate results, which they don't notice. e.g. https://github.com/gevent/gevent/pull/1820 is for Python (SciPy) and adversely affects time to converge. I have no idea if other Java users have similar problems, but it's a bug that I'd like to go away. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From shade at openjdk.org Wed Oct 12 16:13:12 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Oct 2022 16:13:12 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v5] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 14:37:37 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic I think the Java arithmetic correctness is the thing we are after. Allowing a stray library to break it seems very wrong. The alternative to this fix would be detecting the "bad" FPU flags like FTZ/DAZ, provided we can do that without messing with a whole lot of arch-specific code, and then asking users to opt-in for potentially breaking behavior. But I believe current patch is the minimal thing we should do. Some nits: src/hotspot/os/linux/os_linux.cpp line 113: > 111: # include > 112: # include > 113: # include Do we need to shuffle^W sort includes in this patch? I presume you'd want this patch to be cleanly backportable, which means it should probably be as point-y as it can get. src/hotspot/os/linux/os_linux.cpp line 1747: > 1745: void * os::Linux::dlopen_helper(const char *filename, char *ebuf, > 1746: int ebuflen) { > 1747: // JDK-8295159: Protect floating-point environment. We need to be more verbose in these comments. Say something like: There are known cases where global library initialization sets the FPU flags that affect computation accuracy, for example, enabling Flush-To-Zero and Denormals-Are-Zero. Do not let those libraries break the Java arithmetic. Unfortunately, this might break the libraries that might depend on these FPU features for performance and/or numerical "accuracy", but we need to protect the Java semantics first and foremost. See JDK-8295159. src/hotspot/os/linux/os_linux.cpp line 1750: > 1748: fenv_t curr_fenv; > 1749: int rtn = fegetenv(&curr_fenv); > 1750: assert(rtn == 0, "must be."); Suggestion: assert(rtn == 0, "fegetnv must succeed"); src/hotspot/os/linux/os_linux.cpp line 1753: > 1751: void * result = ::dlopen(filename, RTLD_LAZY); > 1752: rtn = fesetenv(&curr_fenv); > 1753: assert(rtn == 0, "must be."); Suggestion: assert(rtn == 0, "fesetenv must succeed"); test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 28: > 26: * @bug 8295159 > 27: * @summary DSO created with -ffast-math breaks Java floating-point arithmetic > 28: * @run main/othervm compiler.floatingpoint.TestDenormalFloat Should it have `/native` somewhere here? test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 39: > 37: // at compiler.floatingpoint.TestDenormalFloat.testFloats(TestDenormalFloat.java:47) > 38: // at compiler.floatingpoint.TestDenormalFloat.main(TestDenormalFloat.java:57) > 39: This comment is redundant, I think. There seem to be little point to run these tests separately? test/hotspot/jtreg/compiler/floatingpoint/TestDenormalFloat.java line 60: > 58: public static void main(String[] args) { > 59: testFloats(); > 60: System.out.println("Loading libfast-math.so"); I propose we have the same logging statement in `TestDoubles`. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 16:13:13 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 16:13:13 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v5] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <9bkcZXTPQDk2_dgCru30yVygdPp0Iga8MrJeINLd198=.8d4ca3c1-f9f4-45e8-afef-303af9f3fd5b@github.com> On Wed, 12 Oct 2022 15:48:25 GMT, Aleksey Shipilev wrote: > Do we need to shuffle^W sort includes in this patch? I presume you'd want this patch to be cleanly backportable, which means it should probably be as point-y as it can get. OK, I can back that part out. > src/hotspot/os/linux/os_linux.cpp line 1747: > >> 1745: void * os::Linux::dlopen_helper(const char *filename, char *ebuf, >> 1746: int ebuflen) { >> 1747: // JDK-8295159: Protect floating-point environment. > > We need to be more verbose in these comments. Say something like: > > > There are known cases where global library initialization sets the FPU flags > that affect computation accuracy, for example, enabling Flush-To-Zero and > Denormals-Are-Zero. Do not let those libraries break the Java arithmetic. > Unfortunately, this might break the libraries that might depend on these FPU > features for performance and/or numerical "accuracy", but we need to protect > the Java semantics first and foremost. See JDK-8295159. Thanks, that is elegantly expressed. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Wed Oct 12 16:20:10 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 16:20:10 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v6] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/os/linux/os_linux.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/os/linux/os_linux.cpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/bcff4597..3f442beb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From tschatzl at openjdk.org Wed Oct 12 16:49:41 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Oct 2022 16:49:41 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently Message-ID: Hi all, can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. Testing: tier1-5 ------------- Commit messages: - initial version Changes: https://git.openjdk.org/jdk/pull/10675/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295118 Stats: 59 lines in 9 files changed: 19 ins; 28 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/10675.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10675/head:pull/10675 PR: https://git.openjdk.org/jdk/pull/10675 From aph at openjdk.org Wed Oct 12 17:00:15 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 12 Oct 2022 17:00:15 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/3f442beb..64ef36f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=05-06 Stats: 62 lines in 3 files changed: 27 ins; 29 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From vlivanov at openjdk.org Wed Oct 12 17:46:10 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 12 Oct 2022 17:46:10 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <3CHKC_SEGCKm2D3rHntfxV21sx0G-AakDLPPf8rPwQc=.533cc9d9-f93b-4a6f-86f6-a221452045d4@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Isn't it an illustration of a more general problem we have with native code where it can mess with FP environment at any time? We already have similar problems with MXCSR register and provide verification logic (part of `-Xcheck:jni`) to catch modifications and support conditional restoration of MXCSR register on x86_64. x86_32 validates x87 control word when `-Xcheck:jni` is enabled. Should we do something similar here instead? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From jvernee at openjdk.org Wed Oct 12 20:31:23 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 12 Oct 2022 20:31:23 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <7yaDR-VCplkpcKoGwJ4_4rZczua_vXMVVXsZ4jFbwYg=.522bd3a5-4b70-4fd7-a6af-184fde528750@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic For upcalls on non-Windows platforms, we also save MXCSR and restore it after the call, and load a set standard value for the Java code that's about to be executed. (I'm not sure why this isn't done on Windows, tbh) Relevant code for JNI is here: - Downcalls: https://github.com/openjdk/jdk/blob/1961e81e02e707cd0c8241aa3af6ddabf7668589/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5002 - Upcalls: https://github.com/openjdk/jdk/blob/1961e81e02e707cd0c8241aa3af6ddabf7668589/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L262 FPCSR is only handled on non _LP64 it looks like. I agree with Vladimir that this seems like a general problem of foreign code potentially messing with control bits (in theory, foreign code could violate its ABI in other ways as well). It seems that both major C/C++ x64 ABIs ([1], [2], [3]) treat the control bits as non-volatile, so the callee should preserve them. This is in theory a choice of a particular ABI, but I think in general we can assume foreign code does not modify the control bits. Though, we never know for sure of course, and I suppose this is where `RestoreMXCSROnJNICalls` comes in. There's no equivalent flag for FPCSR atm AFAICS, so the answer there seems to be "just don't do it". Following that logic: from our perspective `dlopen` violates its ABI in certain cases. Preserving the control bits across calls to `dlopen` seems like a pragmatic solution. I'm not sure how important it is to have an opt-in for the current (broken) behavior... [1]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#fpcsr [2]: https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#mxcsr [3]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf ------------- PR: https://git.openjdk.org/jdk/pull/10661 From mcimadamore at openjdk.org Wed Oct 12 20:39:06 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 12 Oct 2022 20:39:06 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <3CHKC_SEGCKm2D3rHntfxV21sx0G-AakDLPPf8rPwQc=.533cc9d9-f93b-4a6f-86f6-a221452045d4@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3CHKC_SEGCKm2D3rHntfxV21sx0G-AakDLPPf8rPwQc=.533cc9d9-f93b-4a6f-86f6-a221452045d4@github.com> Message-ID: On Wed, 12 Oct 2022 17:43:47 GMT, Vladimir Ivanov wrote: > Isn't it an illustration of a more general problem we have with native code where it can mess with FP environment at any time? > > We already have similar problems with MXCSR register and provide verification logic (part of `-Xcheck:jni`) to catch modifications and support conditional restoration of MXCSR register on x86_64. x86_32 validates x87 control word when `-Xcheck:jni` is enabled. > > Should we do something similar here instead? I tend to agree. As others have observed, a `dlopen` call (or something with same nefarious behavior) could also happen inside JNI code. But I think the interesting (and perhaps surprising) part here is that, from the perspective of the developer, no native code has executed - only a library has been loaded (via `System::loadLibrary`). Note also that this specific problem is triggered by `dlopen` itself, because certain libraries might have some "bad" (from the perspective of JVM) initialization code. But since we're talking about JNI, JNI_OnLoad is another potential source of problem, as its native code is executed as soon as the library is loaded (and that, too, can leave the JVM in a bad state). ------------- PR: https://git.openjdk.org/jdk/pull/10661 From vlivanov at openjdk.org Wed Oct 12 21:50:14 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 12 Oct 2022 21:50:14 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3CHKC_SEGCKm2D3rHntfxV21sx0G-AakDLPPf8rPwQc=.533cc9d9-f93b-4a6f-86f6-a221452045d4@github.com> Message-ID: On Wed, 12 Oct 2022 20:35:09 GMT, Maurizio Cimadamore wrote: > Note also that this specific problem is triggered by dlopen itself, because certain libraries might have some "bad" (from the perspective of JVM) initialization code. ... and it leads to another question whether the JVM itself is amenable to such problems. Do we need to sanitize the environment when returning from calls into the JVM? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From dholmes at openjdk.org Thu Oct 13 02:27:04 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Oct 2022 02:27:04 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: References: <-gHZi1XpPZBoUl_RS2ETKpgODYvyiUvz5hi2AQlMoDQ=.d0a31f28-04e4-4ae9-b8f4-76415a154853@github.com> Message-ID: On Wed, 12 Oct 2022 08:28:56 GMT, Johan Sj?len wrote: >> ?? I only see one usage in the switch statement so don't understand why this is not inline as normal logging code would be. > > I should've said "crosses initialization error", not "crossing." > > Check out this SO question: https://stackoverflow.com/questions/11578936/getting-a-bunch-of-crosses-initialization-error > > So `LogTarget(Debug, deoptimization) lt;` is the error here. I *think* that we can inline it if we introduce a surrounding scope, is that preferable to you? > > So: > > > case T_OBJECT: > *addr = value->get_int(T_OBJECT); > { // Scope off LogTarget > LogTarget(Debug, deoptimization) lt; > if (lt.is_enabled()) { > LogStream ls(lt); > ls.print(" - Reconstructed expression %d (OBJECT): ", i); > oop o = cast_to_oop((address)(*addr)); > if (o == NULL) { > ls.print_cr("NULL"); > } else { > ResourceMark rm; > ls.print_raw_cr(o->klass()->name()->as_C_string()); > } > } > } Yes adding the scope is fine and preferable. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From dholmes at openjdk.org Thu Oct 13 02:35:05 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 13 Oct 2022 02:35:05 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: References: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> Message-ID: <9hxX0Gr6WNWv1iDBmEEAbyW0_rm40k5j3wFWehRuD1s=.66c7a911-176d-47ab-84e4-1d013a521503@github.com> On Wed, 12 Oct 2022 08:33:12 GMT, Johan Sj?len wrote: >> src/hotspot/share/runtime/vframe.hpp line 113: >> >>> 111: virtual void print_value() const; >>> 112: virtual void print(); >>> 113: void print_on(outputStream* st) const override; >> >> Unclear if this should also be virtual. > > This is virtual, taken from `ResourceObj`. The `override` indicates this, I'm basing this on: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rh-override Thanks - missed that. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From haosun at openjdk.org Thu Oct 13 04:12:22 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 13 Oct 2022 04:12:22 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v4] In-Reply-To: References: Message-ID: > In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. > > Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. > > > $ java -XX:+PrintBytecodeHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 5004099 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 319124 6.38% dc fast_aload_0 > 313397 6.26% e0 fast_iload > 251436 5.02% b6 invokevirtual > 227428 4.54% 19 aload > 166054 3.32% a7 goto > 159167 3.18% 2b aload_1 > 151803 3.03% de fast_aaccess_0 > 136787 2.73% 1b iload_1 > 124037 2.48% 36 istore > 118791 2.37% 84 iinc > 118121 2.36% 1c iload_2 > 110484 2.21% a2 if_icmpge > > $ java -XX:+PrintBytecodePairHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 4804441 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 77602 1.615% 84 a7 iinc goto > 49749 1.035% 36 e0 istore fast_iload > 48931 1.018% e0 10 fast_iload bipush > 46294 0.964% e0 b6 fast_iload invokevirtual > 42661 0.888% a7 e0 goto fast_iload > 42243 0.879% 3a 19 astore aload > 40138 0.835% 19 b9 aload invokeinterface > 36617 0.762% dc 2b fast_aload_0 aload_1 > 35745 0.744% b7 dc invokespecial fast_aload_0 > 35384 0.736% 19 b6 aload invokevirtual > 35035 0.729% b6 de invokevirtual fast_aaccess_0 > 34667 0.722% dc b6 fast_aload_0 invokevirtual > > > In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. > > Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. > > Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Remove the atomic operation to "_index" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10642/files - new: https://git.openjdk.org/jdk/pull/10642/files/bbbc3020..84556aeb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10642&range=02-03 Stats: 32 lines in 3 files changed: 1 ins; 22 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10642.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10642/head:pull/10642 PR: https://git.openjdk.org/jdk/pull/10642 From haosun at openjdk.org Thu Oct 13 04:12:23 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 13 Oct 2022 04:12:23 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v3] In-Reply-To: References: <6cvdUIUEwpTzsQLQN25rjEEogFaa6hqBq2CeKv7gl14=.4fc4425b-30f4-4798-b931-5b1213ff59d9@github.com> Message-ID: On Wed, 12 Oct 2022 13:48:25 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove rscratch3 for count_bytecode() and histogram_bytecode() > > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 2004: > >> 2002: /* kind */ Assembler::LSR, >> 2003: /* shift */ BytecodePairHistogram::log2_number_of_codes); >> 2004: > > I've had another look at this. `_index` is a two-element queue of bytecodes, but it is shared between all threads. If two threads access `_index` racily the result will be invalid, regardless of whether the OR into memory is atomic. This will make this PR much simpler. > None of the other ports access `_index` atomically, and should need we. On the other hand, bumping the bytecode counter atomically is fine. > We could make a thread-local `_index`, but it's too much code to be worthwhile. Thanks for pointing this out. Updated. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From aboldtch at openjdk.org Thu Oct 13 06:19:47 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 13 Oct 2022 06:19:47 GMT Subject: RFR: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler Message-ID: Remove implicit `= noreg` temporary register arguments for the three methods that still have them. * `load_heap_oop` * `store_heap_oop` * `load_heap_oop_not_null` Only `load_heap_oop` is used with the implicit `= noreg` arguments. After [JDK-8293351](https://bugs.openjdk.org/browse/JDK-8293351) the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. Testing: linux-aarch64, macosx-aarch64 tier 1-3 ------------- Commit messages: - UPSTREAM: Remove implicit aarch64 registers Changes: https://git.openjdk.org/jdk/pull/10688/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10688&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295257 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/10688.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10688/head:pull/10688 PR: https://git.openjdk.org/jdk/pull/10688 From jbhateja at openjdk.org Thu Oct 13 07:26:06 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 13 Oct 2022 07:26:06 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector In-Reply-To: References: Message-ID: <_wyFWAET_qXwwj-9Iq9AsPAGbT3AXIwN6HujmwZVRPw=.9c652886-4255-4c03-89d9-e3c74f9f319a@github.com> On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong wrote: > "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! src/hotspot/share/opto/vectorIntrinsics.cpp line 2949: > 2947: } else if (elem_bt == T_DOUBLE) { > 2948: iota = gvn().transform(new VectorCastL2XNode(iota, vt)); > 2949: } Since we are loading constants from stub initialized memory locations, defining new stubs for floating point iota indices may eliminate need for costly conversion instructions. Specially on X86 conversion b/w Long and Double is only supported by AVX512DQ targets and intrinsification may fail for legacy targets. src/hotspot/share/opto/vectorIntrinsics.cpp line 2978: > 2976: case T_DOUBLE: { > 2977: scale = gvn().transform(new ConvI2LNode(scale)); > 2978: scale = gvn().transform(new ConvL2DNode(scale)); Prior target support check for these IR nodes may prevent surprises in the backend. src/hotspot/share/opto/vectorIntrinsics.cpp line 2978: > 2976: case T_DOUBLE: { > 2977: scale = gvn().transform(new ConvI2LNode(scale)); > 2978: scale = gvn().transform(new ConvL2DNode(scale)); Any specific reason for not directly using ConvI2D for double case. ------------- PR: https://git.openjdk.org/jdk/pull/10332 From xgong at openjdk.org Thu Oct 13 07:32:07 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 13 Oct 2022 07:32:07 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector In-Reply-To: <_wyFWAET_qXwwj-9Iq9AsPAGbT3AXIwN6HujmwZVRPw=.9c652886-4255-4c03-89d9-e3c74f9f319a@github.com> References: <_wyFWAET_qXwwj-9Iq9AsPAGbT3AXIwN6HujmwZVRPw=.9c652886-4255-4c03-89d9-e3c74f9f319a@github.com> Message-ID: On Thu, 13 Oct 2022 07:18:24 GMT, Jatin Bhateja wrote: >> "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. >> >> This patch adds the vector intrinsic implementation of it. The steps are: >> >> 1) Load the const "iota" vector. >> >> We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. >> >> 2) Compute indexes with "`vec + iota * scale`" >> >> Here is the performance result to the new added micro benchmark on ARM NEON: >> >> Benchmark Gain >> IndexVectorBenchmark.byteIndexVector 1.477 >> IndexVectorBenchmark.doubleIndexVector 5.031 >> IndexVectorBenchmark.floatIndexVector 5.342 >> IndexVectorBenchmark.intIndexVector 5.529 >> IndexVectorBenchmark.longIndexVector 3.177 >> IndexVectorBenchmark.shortIndexVector 5.841 >> >> >> Please help to review and share the feedback! Thanks in advance! > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2978: > >> 2976: case T_DOUBLE: { >> 2977: scale = gvn().transform(new ConvI2LNode(scale)); >> 2978: scale = gvn().transform(new ConvL2DNode(scale)); > > Any specific reason for not directly using ConvI2D for double case. Good catch, I think it's ok to use ConvI2D here. I will change this. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10332 From chagedorn at openjdk.org Thu Oct 13 07:33:08 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Oct 2022 07:33:08 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 14:28:02 GMT, Magnus Ihse Bursie wrote: > If is is a big change or not depends on how it affects builds on macos. I assume that clang functionality that was published in 2017 is already incorporated in the minimum supported version of Xcode on mac. (This needs to be verified, though) > > For clang on linux; Oracle do not regularly build linux with clang, nor do our GHA build scripts. I don't know if anyone regularly tests this. > > My guess is that the 3.5 limit was put in place when the clang for linux support was added, and it has not been modified since. Okay, I see, thanks for the summary! > But you are probably right that it should be a separate PR, if not for anything else so to get proper attention to it. Yes, it's probably better to get more attention for this update. > Do you want me to publish such a PR first, or do you want to continue with the current conditionals, and clean them up afterwards if/when we go to clang 5.0+? I think both is fine. But since I already have the patch ready, I suggest to move forward with it and then come back later to clean it up. ------------- PR: https://git.openjdk.org/jdk/pull/10287 From rkennke at openjdk.org Thu Oct 13 07:33:48 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 13 Oct 2022 07:33:48 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v4] In-Reply-To: References: Message-ID: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: RISC-V port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10590/files - new: https://git.openjdk.org/jdk/pull/10590/files/4ccdab8f..d9153be5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=02-03 Stats: 368 lines in 11 files changed: 89 ins; 211 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From aph at openjdk.org Thu Oct 13 07:40:04 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 13 Oct 2022 07:40:04 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v4] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 04:12:22 GMT, Hao Sun wrote: >> In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. >> >> Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. >> >> >> $ java -XX:+PrintBytecodeHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 5004099 executed bytecodes: >> >> absolute relative code name >> ---------------------------------------------------------------------- >> 319124 6.38% dc fast_aload_0 >> 313397 6.26% e0 fast_iload >> 251436 5.02% b6 invokevirtual >> 227428 4.54% 19 aload >> 166054 3.32% a7 goto >> 159167 3.18% 2b aload_1 >> 151803 3.03% de fast_aaccess_0 >> 136787 2.73% 1b iload_1 >> 124037 2.48% 36 istore >> 118791 2.37% 84 iinc >> 118121 2.36% 1c iload_2 >> 110484 2.21% a2 if_icmpge >> >> $ java -XX:+PrintBytecodePairHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 4804441 executed bytecode pairs: >> >> absolute relative codes 1st bytecode 2nd bytecode >> ---------------------------------------------------------------------- >> 77602 1.615% 84 a7 iinc goto >> 49749 1.035% 36 e0 istore fast_iload >> 48931 1.018% e0 10 fast_iload bipush >> 46294 0.964% e0 b6 fast_iload invokevirtual >> 42661 0.888% a7 e0 goto fast_iload >> 42243 0.879% 3a 19 astore aload >> 40138 0.835% 19 b9 aload invokeinterface >> 36617 0.762% dc 2b fast_aload_0 aload_1 >> 35745 0.744% b7 dc invokespecial fast_aload_0 >> 35384 0.736% 19 b6 aload invokevirtual >> 35035 0.729% b6 de invokevirtual fast_aaccess_0 >> 34667 0.722% dc b6 fast_aload_0 invokevirtual >> >> >> In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. >> >> Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. >> >> Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Remove the atomic operation to "_index" OK, thanks. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.org/jdk/pull/10642 From aph at openjdk.org Thu Oct 13 07:42:17 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 13 Oct 2022 07:42:17 GMT Subject: RFR: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 06:12:49 GMT, Axel Boldt-Christmas wrote: > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293351](https://bugs.openjdk.org/browse/JDK-8293351) the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: linux-aarch64, macosx-aarch64 tier 1-3 Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10688 From aph at openjdk.org Thu Oct 13 07:51:11 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 13 Oct 2022 07:51:11 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3CHKC_SEGCKm2D3rHntfxV21sx0G-AakDLPPf8rPwQc=.533cc9d9-f93b-4a6f-86f6-a221452045d4@github.com> Message-ID: On Wed, 12 Oct 2022 20:35:09 GMT, Maurizio Cimadamore wrote: > Isn't it an illustration of a more general problem we have with native code where it can mess with FP environment at any time? Yes. > We already have similar problems with MXCSR register and provide verification logic (part of `-Xcheck:jni`) to catch modifications and support conditional restoration of MXCSR register on x86_64. x86_32 validates x87 control word when `-Xcheck:jni` is enabled. > > Should we do something similar here instead? The problem is that this bug is very insidious: the user probably doesn't know that there's anything wrong, and almost certainly has no idea that it's anything to do with JNI. Saving and restoring the floating-point environment across dlopen() is a compromise between adding extra code at JNI callouts, which can be expensive, and shifting the burden to the user. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Thu Oct 13 07:53:12 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 13 Oct 2022 07:53:12 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <7yaDR-VCplkpcKoGwJ4_4rZczua_vXMVVXsZ4jFbwYg=.522bd3a5-4b70-4fd7-a6af-184fde528750@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <7yaDR-VCplkpcKoGwJ4_4rZczua_vXMVVXsZ4jFbwYg=.522bd3a5-4b70-4fd7-a6af-184fde528750@github.com> Message-ID: On Wed, 12 Oct 2022 20:26:34 GMT, Jorn Vernee wrote: > Following that logic: from our perspective `dlopen` violates its ABI in certain cases. Preserving the control bits across calls to `dlopen` seems like a pragmatic solution. I'm not sure how important it is to have an opt-in for the current (broken) behavior... Not at all, I'd have thought. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From haosun at openjdk.org Thu Oct 13 08:01:06 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 13 Oct 2022 08:01:06 GMT Subject: RFR: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 06:12:49 GMT, Axel Boldt-Christmas wrote: > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293351](https://bugs.openjdk.org/browse/JDK-8293351) the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: linux-aarch64, macosx-aarch64 tier 1-3 I wonder if we can remove the implicit `= noreg` arguments for `load_sized_value()` and `store_sized_value()` as well? ------------- PR: https://git.openjdk.org/jdk/pull/10688 From aph at openjdk.org Thu Oct 13 08:33:07 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 13 Oct 2022 08:33:07 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Here's another possibility: save the FP environment over up- and down-calls, and restore it if the control bits have changed. We can do that in a portable way, overriding it in a more performant way with arch-specific code. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aboldtch at openjdk.org Thu Oct 13 08:37:10 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 13 Oct 2022 08:37:10 GMT Subject: RFR: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 07:58:55 GMT, Hao Sun wrote: > I wonder if we can remove the implicit `= noreg` arguments for `load_sized_value()` and `store_sized_value()` as well? The argument can be removed all together. `Register dst2` and `Register src2` just seems to be leftovers from when this was created from the x86 MacroAssembler. They are only used for x86_32. I did not remove them in this patch as that change is more of a trivial code cleanup. (Removing unused arguments). ------------- PR: https://git.openjdk.org/jdk/pull/10688 From haosun at openjdk.org Thu Oct 13 08:45:11 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 13 Oct 2022 08:45:11 GMT Subject: RFR: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 08:33:15 GMT, Axel Boldt-Christmas wrote: > > I wonder if we can remove the implicit `= noreg` arguments for `load_sized_value()` and `store_sized_value()` as well? > > The argument can be removed all together. `Register dst2` and `Register src2` just seems to be leftovers from when this was created from the x86 MacroAssembler. They are only used for x86_32. I did not remove them in this patch as that change is more of a trivial code cleanup. (Removing unused arguments). Okay. Thanks. LGTM (I'm not a Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10688 From rehn at openjdk.org Thu Oct 13 08:50:27 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 13 Oct 2022 08:50:27 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v4] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 07:33:48 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > RISC-V port On aarch64 (linux and mac) I see these variations of crashes in random tests: # Internal Error .... src/hotspot/share/c1/c1_Runtime1.cpp:768), pid=2884803, tid=2884996 # assert(oopDesc::is_oop(oop(obj))) failed: must be NULL or an object: 0x000000000000dead # V [libjvm.so+0x7851d4] Runtime1::monitorexit(JavaThread*, oopDesc*)+0x110 # SIGSEGV (0xb) at pc=0x0000fffc9d4e3de8, pid=1842880, tid=1842994 # V [libjvm.so+0xbf3de8] SharedRuntime::monitor_exit_helper(oopDesc*, JavaThread*)+0x24 # SIGSEGV (0xb) at pc=0x0000fffca9f00394, pid=959883, tid=959927 # V [libjvm.so+0xc90394] ObjectSynchronizer::exit(oopDesc*, JavaThread*)+0x54 ------------- PR: https://git.openjdk.org/jdk/pull/10590 From bkilambi at openjdk.org Thu Oct 13 10:12:42 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 13 Oct 2022 10:12:42 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v2] In-Reply-To: References: Message-ID: > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Modified JTREG test to include feature constraints ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10407/files - new: https://git.openjdk.org/jdk/pull/10407/files/b2de6107..6df4f014 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10407.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10407/head:pull/10407 PR: https://git.openjdk.org/jdk/pull/10407 From bkilambi at openjdk.org Thu Oct 13 10:12:44 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 13 Oct 2022 10:12:44 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v2] In-Reply-To: References: Message-ID: <9_Z5FH23oQ3RdzoP-FiwjNOTILIE_z6iNY___tz9g30=.ed02336f-c9c0-4c15-9d62-1a5a0600fdda@github.com> On Wed, 12 Oct 2022 10:46:17 GMT, Hao Sun wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modified JTREG test to include feature constraints > > test/hotspot/jtreg/compiler/vectorization/TestEor3AArch64.java line 38: > >> 36: * @summary Test EOR3 Neon/SVE2 instruction for aarch64 SHA3 extension >> 37: * @library /test/lib / >> 38: * @requires os.arch == "aarch64" & vm.cpu.features ~=".*sha3.*" > > Suggestion: > > * @requires os.arch == "aarch64" & vm.cpu.features ~= ".*sha3.*" > > nit: [style] it's better to have one extra space. @shqking Thank you for the comments. I have made the suggested changes. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From aboldtch at openjdk.org Thu Oct 13 10:30:44 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 13 Oct 2022 10:30:44 GMT Subject: RFR: 8295258: Add BasicType argument to AccessInternal::decorator_fixup Message-ID: There are call sites of the access API that does not specify the `INTERNAL_VALUE_IS_OOP` decorator for some reference type access. Change it so that `AccessInternal::decorator_fixup` is able to add this decorator by passing in the `BasicType` of the access. Testing: Oracle platforms tier 1-3 and GHA ------------- Commit messages: - UPSTREAM: decorator_fixup Changes: https://git.openjdk.org/jdk/pull/10692/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10692&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295258 Stats: 19 lines in 9 files changed: 2 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/10692.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10692/head:pull/10692 PR: https://git.openjdk.org/jdk/pull/10692 From rkennke at openjdk.org Thu Oct 13 10:35:16 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 13 Oct 2022 10:35:16 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v5] In-Reply-To: References: Message-ID: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/fast-locking' into fast-locking - Revert "Re-use r0 in call to unlock_object()" This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10590/files - new: https://git.openjdk.org/jdk/pull/10590/files/d9153be5..8d146b99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=03-04 Stats: 7 lines in 3 files changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Thu Oct 13 10:36:34 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 13 Oct 2022 10:36:34 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v4] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 08:46:45 GMT, Robbin Ehn wrote: > On aarch64 (linux and mac) I see these variations of crashes in random tests: (asserts in debug, crash in release it looks like) > > ``` > # Internal Error .... src/hotspot/share/c1/c1_Runtime1.cpp:768), pid=2884803, tid=2884996 > # assert(oopDesc::is_oop(oop(obj))) failed: must be NULL or an object: 0x000000000000dead > # V [libjvm.so+0x7851d4] Runtime1::monitorexit(JavaThread*, oopDesc*)+0x110 > ``` > > ``` > # SIGSEGV (0xb) at pc=0x0000fffc9d4e3de8, pid=1842880, tid=1842994 > # V [libjvm.so+0xbf3de8] SharedRuntime::monitor_exit_helper(oopDesc*, JavaThread*)+0x24 > ``` > > ``` > # SIGSEGV (0xb) at pc=0x0000fffca9f00394, pid=959883, tid=959927 > # V [libjvm.so+0xc90394] ObjectSynchronizer::exit(oopDesc*, JavaThread*)+0x54 > ``` Ugh. That is most likely caused by the recent change: https://github.com/rkennke/jdk/commit/ebbcb615a788998596f403b47b72cf133cb9de46 It used to be very stable before that. I have backed out that change, can you try again? Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Thu Oct 13 10:42:03 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 13 Oct 2022 10:42:03 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 20:41:57 GMT, Robbin Ehn wrote: > Regarding benchmarks, is it possible to get some indication what fast-locking+lillput result will be? FinagleHttp seems to suffer a bit, will Lillput give some/all of that back, or more? That particular benchmark, as some others, exhibit relatively high run-to-run variance. I have run it again many more times to average-out the variance, and I'm now getting the following results: baseline: 3503.844 ms/ops, fast-locking: 3546.344 ms/ops, percent: -1.20% That is still a slight regression, but with more confidence. Regarding Lilliput, I cannot really say at the moment. Some workloads are actually regressing with Lilliput, presumably because they are sensitive on the performance of loading the Klass* out of objects, and that is currently more complex in Lilliput (because it needs to coordinate with monitor locking). FinagleHttp seems to be one of those workloads. I am working to get rid of this limitation, and then I can be more specific. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From stefank at openjdk.org Thu Oct 13 10:52:11 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 13 Oct 2022 10:52:11 GMT Subject: RFR: 8295258: Add BasicType argument to AccessInternal::decorator_fixup In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 10:24:15 GMT, Axel Boldt-Christmas wrote: > There are call sites of the access API that does not specify the `INTERNAL_VALUE_IS_OOP` decorator for some reference type access. Change it so that `AccessInternal::decorator_fixup` is able to add this decorator by passing in the `BasicType` of the access. > > Testing: Oracle platforms tier 1-3 and GHA Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10692 From shade at openjdk.org Thu Oct 13 11:08:28 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Oct 2022 11:08:28 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v9] In-Reply-To: <1JwBm4xpt9EfiFBlkharONlS8E3RjMDXq2sCtQyu0UQ=.efc05433-4767-4863-bfbc-b5cf6aba672c@github.com> References: <_O1386bHeagTrVI68UU0GbvMWV8XtiRE3phbjYFZ81A=.b35c61a2-af07-43f8-8fab-cb218059e465@github.com> <1JwBm4xpt9EfiFBlkharONlS8E3RjMDXq2sCtQyu0UQ=.efc05433-4767-4863-bfbc-b5cf6aba672c@github.com> Message-ID: On Wed, 12 Oct 2022 11:07:36 GMT, Aleksey Shipilev wrote: > > @shipilev In a "tenth time's the charm" spirit, here's what I do think is actually a PR that can be integrated. > > Cool! I'll try to schedule the overnight build-matrix run to see if anything is broken. I was able to build the matrix of: - `make hotspot` - GCC {9, 10} - {i686, x86_64, aarch64, powerpc64le, s390x, arm, riscv64, powerpc64} - {server, client, minimal, zero} - {release, fastdebug, slowdebug, optimized} All these build with default warnings enabled, and they pass. So I think we are more or less safe with this minimization. ------------- PR: https://git.openjdk.org/jdk/pull/10414 From shade at openjdk.org Thu Oct 13 11:08:28 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Oct 2022 11:08:28 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: <33IO5LNbo4XDhUV2tinAvOIBs7pOne5YuVzmR6qnb2g=.6edf4423-8e3b-4a0f-8d0c-5ba89a9cd94f@github.com> On Wed, 12 Oct 2022 11:04:14 GMT, Magnus Ihse Bursie wrote: >> After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. >> >> Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. >> >> Some warnings didn't trigger in any file anymore, and could just be removed. >> >> Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. >> >> I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. >> >> I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. >> >> It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). >> >> Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Github workflow changes were not supposed to be in this PR... Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10414 From haosun at openjdk.org Thu Oct 13 12:01:08 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 13 Oct 2022 12:01:08 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v2] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 10:12:42 GMT, Bhavana Kilambi wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modified JTREG test to include feature constraints LGTM (I'm not a Reviewer). ------------- Marked as reviewed by haosun (Author). PR: https://git.openjdk.org/jdk/pull/10407 From fyang at openjdk.org Thu Oct 13 12:28:47 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Oct 2022 12:28:47 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions Message-ID: Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and control assembler functions from addresses in 'address'/'Address' form. Testing: Tier1 hotspot on HiFive Unmatched board {fastdebug, release}. ------------- Commit messages: - Fix trailing whitespace - 8295270: RISC-V: Clean up and refactoring for assembler functions Changes: https://git.openjdk.org/jdk/pull/10697/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295270 Stats: 1344 lines in 11 files changed: 637 ins; 601 del; 106 mod Patch: https://git.openjdk.org/jdk/pull/10697.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10697/head:pull/10697 PR: https://git.openjdk.org/jdk/pull/10697 From aboldtch at openjdk.org Thu Oct 13 12:39:51 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 13 Oct 2022 12:39:51 GMT Subject: RFR: 8295273: Remove unused argument in [load/store]_sized_value on aarch64 and riscv Message-ID: Remove the unused argument `Register [dst2/src2]` in [load/store]_sized_value on aarch64 and riscv ports. Seems like they were brought in in the initial ports [JDK-8068054](https://bugs.openjdk.org/browse/JDK-8068054) [JDK-8276799](https://bugs.openjdk.org/browse/JDK-8276799) as just straight signature copies of x86. The second register is only required on x86 for x86_32 support and is unused on riscv and aarch64. Should be a trivial removal. Testing: Cross-compiled for riscv and aarch64. Waiting for GHA ------------- Commit messages: - Cleanup [load/store]_sized_value Changes: https://git.openjdk.org/jdk/pull/10698/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10698&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295273 Stats: 8 lines in 4 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10698.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10698/head:pull/10698 PR: https://git.openjdk.org/jdk/pull/10698 From fjiang at openjdk.org Thu Oct 13 12:57:14 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 13 Oct 2022 12:57:14 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 12:17:00 GMT, Fei Yang wrote: > Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, > such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed > in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and > control assembler functions from address of type 'address' / 'Address'. > > Testing: Tier1 hotspot on HiFive Unmatched board {fastdebug, release}. src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 678: > 676: Assembler::NAME(Rd, Rd, ((int32_t)offset << 20) >> 20); \ > 677: } else { \ > 678: int32_t offset = 0; \ It might be confusing that we have already defined `offset` in L673 with a different type. ------------- PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Thu Oct 13 13:32:32 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Oct 2022 13:32:32 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v2] In-Reply-To: References: Message-ID: > Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, > such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed > in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and > control assembler functions from address of type 'address' / 'Address'. > > Testing: Tier1 hotspot on HiFive Unmatched board {fastdebug, release}. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Fix name of locals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10697/files - new: https://git.openjdk.org/jdk/pull/10697/files/bf891cf3..c7672443 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=00-01 Stats: 25 lines in 2 files changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/10697.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10697/head:pull/10697 PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Thu Oct 13 13:32:32 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Oct 2022 13:32:32 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v2] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 12:54:57 GMT, Feilong Jiang wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix name of locals > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 678: > >> 676: Assembler::NAME(Rd, Rd, ((int32_t)offset << 20) >> 20); \ >> 677: } else { \ >> 678: int32_t offset = 0; \ > > It might be confusing that we have already defined `offset` in L673 with a different type. Fixed. Thanks for pointing this out. ------------- PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Thu Oct 13 13:57:04 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Oct 2022 13:57:04 GMT Subject: RFR: 8295273: Remove unused argument in [load/store]_sized_value on aarch64 and riscv In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 12:29:39 GMT, Axel Boldt-Christmas wrote: > Remove the unused argument `Register [dst2/src2]` in [load/store]_sized_value on aarch64 and riscv ports. > > Seems like they were brought in in the initial ports [JDK-8068054](https://bugs.openjdk.org/browse/JDK-8068054) [JDK-8276799](https://bugs.openjdk.org/browse/JDK-8276799) as just straight signature copies of x86. The second register is only required on x86 for x86_32 support and is unused on riscv and aarch64. > > Should be a trivial removal. > > Testing: Cross-compiled for riscv and aarch64. Waiting for GHA LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10698 From haosun at openjdk.org Thu Oct 13 14:13:11 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 13 Oct 2022 14:13:11 GMT Subject: RFR: 8295273: Remove unused argument in [load/store]_sized_value on aarch64 and riscv In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 12:29:39 GMT, Axel Boldt-Christmas wrote: > Remove the unused argument `Register [dst2/src2]` in [load/store]_sized_value on aarch64 and riscv ports. > > Seems like they were brought in in the initial ports [JDK-8068054](https://bugs.openjdk.org/browse/JDK-8068054) [JDK-8276799](https://bugs.openjdk.org/browse/JDK-8276799) as just straight signature copies of x86. The second register is only required on x86 for x86_32 support and is unused on riscv and aarch64. > > Should be a trivial removal. > > Testing: Cross-compiled for riscv and aarch64. Waiting for GHA LGTM. (I'm not a Reviewer) ------------- Marked as reviewed by haosun (Author). PR: https://git.openjdk.org/jdk/pull/10698 From fyang at openjdk.org Thu Oct 13 14:25:07 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 13 Oct 2022 14:25:07 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v3] In-Reply-To: References: Message-ID: <2naeDigOwQ7G3M-0CYtEZsvcKoqR81-gfqkzqZ10gts=.4bd5f7c8-38c8-479d-8adc-d58c95b0cb75@github.com> > Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, > such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed > in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and > control assembler functions from address of type 'address' / 'Address'. > > Testing: Tier1 hotspot on HiFive Unmatched board {fastdebug, release}. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10697/files - new: https://git.openjdk.org/jdk/pull/10697/files/c7672443..d98ab458 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10697.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10697/head:pull/10697 PR: https://git.openjdk.org/jdk/pull/10697 From jwaters at openjdk.org Thu Oct 13 14:48:29 2022 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 13 Oct 2022 14:48:29 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v5] In-Reply-To: References: Message-ID: > The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into patch-1 - Merge branch 'openjdk:master' into patch-1 - Comment documenting change isn't required - Merge branch 'openjdk:master' into patch-1 - Comment formatting - Remove Windows specific JLI_Snprintf implementation - Remove Windows JLI_Snprintf definition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10625/files - new: https://git.openjdk.org/jdk/pull/10625/files/8ac9b519..a24ef092 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10625&range=03-04 Stats: 2113 lines in 62 files changed: 1331 ins; 653 del; 129 mod Patch: https://git.openjdk.org/jdk/pull/10625.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10625/head:pull/10625 PR: https://git.openjdk.org/jdk/pull/10625 From jwaters at openjdk.org Thu Oct 13 14:48:30 2022 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 13 Oct 2022 14:48:30 GMT Subject: RFR: 8295017: Remove Windows specific workaround in JLI_Snprintf [v3] In-Reply-To: References: Message-ID: <0wUuynDia128uyCaMmWi7BltH8HQcyI-CKcyGcP_Ucc=.89942c4d-b2a5-4fd2-8599-0c43745057a6@github.com> On Tue, 11 Oct 2022 02:01:12 GMT, David Holmes wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Comment documenting change isn't required >> - Merge branch 'openjdk:master' into patch-1 >> - Comment formatting >> - Remove Windows specific JLI_Snprintf implementation >> - Remove Windows JLI_Snprintf definition > > Looks good. Thanks. @dholmes-ora could I trouble you for a sponsor? Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10625 From fjiang at openjdk.org Thu Oct 13 15:12:39 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 13 Oct 2022 15:12:39 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v3] In-Reply-To: <2naeDigOwQ7G3M-0CYtEZsvcKoqR81-gfqkzqZ10gts=.4bd5f7c8-38c8-479d-8adc-d58c95b0cb75@github.com> References: <2naeDigOwQ7G3M-0CYtEZsvcKoqR81-gfqkzqZ10gts=.4bd5f7c8-38c8-479d-8adc-d58c95b0cb75@github.com> Message-ID: On Thu, 13 Oct 2022 14:25:07 GMT, Fei Yang wrote: >> Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, >> such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed >> in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and >> control assembler functions from address of type 'address' / 'Address'. >> >> Testing: Tier1 hotspot on HiFive Unmatched board {fastdebug, release}. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Fix Change looks good. Thanks. ------------- Marked as reviewed by fjiang (Author). PR: https://git.openjdk.org/jdk/pull/10697 From mgronlun at openjdk.org Thu Oct 13 16:12:50 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 13 Oct 2022 16:12:50 GMT Subject: RFR: 8295274: HelidonAppTest.java fails "assert(event->should_commit()) failed: invariant" from compiled frame" Message-ID: Greetings, In [JDK-8287832](https://bugs.openjdk.org/browse/JDK-8287832), a change was made that removed the cached shouldCommit() state, under the premise that shouldCommit() is only called once. There are a few places that assert on shouldCommit(), that should have been removed as part of [JDK-8287832](https://bugs.openjdk.org/browse/JDK-8287832). This change complements these removals. Testing: jdk_jfr Thanks Markus ------------- Commit messages: - 8295274 Changes: https://git.openjdk.org/jdk/pull/10700/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10700&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295274 Stats: 4 lines in 4 files changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10700.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10700/head:pull/10700 PR: https://git.openjdk.org/jdk/pull/10700 From vlivanov at openjdk.org Thu Oct 13 18:13:41 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Oct 2022 18:13:41 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <3CHKC_SEGCKm2D3rHntfxV21sx0G-AakDLPPf8rPwQc=.533cc9d9-f93b-4a6f-86f6-a221452045d4@github.com> Message-ID: On Thu, 13 Oct 2022 07:47:20 GMT, Andrew Haley wrote: > The problem is that this bug is very insidious: the user probably doesn't know that there's anything wrong, and almost certainly has no idea that it's anything to do with JNI. I'm still trying to grasp why the current problem is something different from what we experienced before. Some platforms (x86_32 and AArch32) provide a way to restore FP environment, but it is turned off by default. Why x86_64 case is different, so it requires the logic to be turned on by default? Does the same reasoning apply to those platforms as well and we want to have `AlwaysRestoreFPU` turned on by default? $ grep -r AlwaysRestoreFPU src/hotspot/ src/hotspot//cpu/x86/templateInterpreterGenerator_x86.cpp: if (AlwaysRestoreFPU) { src/hotspot//cpu/x86/sharedRuntime_x86_32.cpp: if (AlwaysRestoreFPU) { src/hotspot//cpu/arm/sharedRuntime_arm.cpp: if (AlwaysRestoreFPU) { src/hotspot//cpu/arm/templateInterpreterGenerator_arm.cpp: if (AlwaysRestoreFPU) { src/hotspot//share/runtime/globals.hpp: product(bool, AlwaysRestoreFPU, false, \ ------------- PR: https://git.openjdk.org/jdk/pull/10661 From shade at openjdk.org Thu Oct 13 19:10:36 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Oct 2022 19:10:36 GMT Subject: RFR: 8294211: Zero: Decode arch-specific error context if possible [v4] In-Reply-To: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> References: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> Message-ID: > After POSIX signal refactorings, Zero error handling had "regressed" a bit: Zero always gets `NULL` as `pc` in error handling code, and thus it fails with SEGV at pc=0x0. We can do better by implementing context decoding where possible. > > Unfortunately, this introduces some arch-specific code in Zero code. The arch-specific code is copy-pasted (with inline definitions, if needed) from the relevant `os_linux_*.cpp` files. The unimplemented arches would still report the same confusing `hs_err`-s. We can emulate (and thus test) the generic behavior using new diagnostic VM option. > > This reverts parts of [JDK-8259392](https://bugs.openjdk.org/browse/JDK-8259392). > > Sample test: > > > import java.lang.reflect.*; > import sun.misc.Unsafe; > > public class Crash { > public static void main(String... args) throws Exception { > Field f = Unsafe.class.getDeclaredField("theUnsafe"); > f.setAccessible(true); > Unsafe u = (Unsafe) f.get(null); > u.getInt(42); // accesing via broken ptr > } > } > > > Linux x86_64 Zero fastdebug crash currently: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000000000000, pid=538793, tid=538794 > # > ... > # (no native frame info) > ... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Linux x86_64 Zero fastdebug crash with this patch: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007fbbbf08b584, pid=520119, tid=520120 > # > ... > # Problematic frame: > # V [libjvm.so+0xcbe584] Unsafe_GetInt+0xe4 > .... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Linux x86_64 Zero fastdebug crash with this patch and `-XX:-DecodeErrorContext`: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000000000000, pid=520268, tid=520269 > # > ... > # Problematic frame: > # C 0x0000000000000000 > ... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Additional testing: > - [x] Linux x86_64 Zero fastdebug eyeballing crash logs > - [x] Linux x86_64 Zero fastdebug, `tier1` > - [x] Linux {x86_64, x86_32, aarch64, arm, riscv64, s390x, ppc64le, ppc64be} Zero fastdebug builds Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8294211-zero-error-context - Merge branch 'master' into JDK-8294211-zero-error-context - Merge branch 'master' into JDK-8294211-zero-error-context - Style nits - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10397/files - new: https://git.openjdk.org/jdk/pull/10397/files/3c3299e3..fe524d40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10397&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10397&range=02-03 Stats: 24287 lines in 607 files changed: 16878 ins; 4692 del; 2717 mod Patch: https://git.openjdk.org/jdk/pull/10397.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10397/head:pull/10397 PR: https://git.openjdk.org/jdk/pull/10397 From shade at openjdk.org Thu Oct 13 19:10:38 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Oct 2022 19:10:38 GMT Subject: RFR: 8294211: Zero: Decode arch-specific error context if possible [v3] In-Reply-To: References: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> Message-ID: On Thu, 29 Sep 2022 16:09:57 GMT, Aleksey Shipilev wrote: > I think this still works. Any other reviews, please? Ping. :) ------------- PR: https://git.openjdk.org/jdk/pull/10397 From duke at openjdk.org Thu Oct 13 21:28:55 2022 From: duke at openjdk.org (Joshua Cao) Date: Thu, 13 Oct 2022 21:28:55 GMT Subject: RFR: 8295288: Some vm_flags tests associate with a wrong BugID Message-ID: Fixing incorrect bug IDs. IntxTest.java already has the correct bug ID, and SizeTTest has a different bug ID that is correct. All changed tests passing locally. ------------- Commit messages: - 8295288: Some vm_flags tests associate with a wrong BugID Changes: https://git.openjdk.org/jdk/pull/10703/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10703&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295288 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10703.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10703/head:pull/10703 PR: https://git.openjdk.org/jdk/pull/10703 From phh at openjdk.org Thu Oct 13 22:21:02 2022 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 13 Oct 2022 22:21:02 GMT Subject: RFR: 8295288: Some vm_flags tests associate with a wrong BugID In-Reply-To: References: Message-ID: <30byStGmwn2WwG0LQ5zJTaHyafCLtvqwyHCyymEuLvU=.4441b5ce-c9e8-4b0d-9e0e-72a816df45a0@github.com> On Thu, 13 Oct 2022 21:07:01 GMT, Joshua Cao wrote: > Fixing incorrect bug IDs. IntxTest.java already has the correct bug ID, and SizeTTest has a different bug ID that is correct. > > All changed tests passing locally. Lgtm, and trivial. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.org/jdk/pull/10703 From jwaters at openjdk.org Thu Oct 13 23:54:11 2022 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 13 Oct 2022 23:54:11 GMT Subject: Integrated: 8295017: Remove Windows specific workaround in JLI_Snprintf In-Reply-To: References: Message-ID: On Sun, 9 Oct 2022 08:03:36 GMT, Julian Waters wrote: > The C99 snprintf is available with Visual Studio 2015 and above, alongside Windows 10 and the UCRT, and is no longer identical to the outdated Windows _snprintf. Since support for the Visual C++ 2017 compiler was removed a while ago, we can now safely remove the compatibility workaround on Windows and have JLI_Snprintf simply delegate to snprintf. This pull request has now been integrated. Changeset: 2b4830a3 Author: Julian Waters Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/2b4830a3959496372719270614a58737cf4deb2f Stats: 36 lines in 2 files changed: 2 ins; 34 del; 0 mod 8295017: Remove Windows specific workaround in JLI_Snprintf Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/10625 From yadongwang at openjdk.org Fri Oct 14 01:04:06 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 14 Oct 2022 01:04:06 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v3] In-Reply-To: <2naeDigOwQ7G3M-0CYtEZsvcKoqR81-gfqkzqZ10gts=.4bd5f7c8-38c8-479d-8adc-d58c95b0cb75@github.com> References: <2naeDigOwQ7G3M-0CYtEZsvcKoqR81-gfqkzqZ10gts=.4bd5f7c8-38c8-479d-8adc-d58c95b0cb75@github.com> Message-ID: On Thu, 13 Oct 2022 14:25:07 GMT, Fei Yang wrote: >> Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, >> such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed >> in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and >> control assembler functions from address of type 'address' / 'Address'. >> >> Testing: Tier1 on HiFive Unmatched board {fastdebug, release}. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Fix lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Fri Oct 14 01:22:58 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 01:22:58 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 11:26:16 GMT, Aleksey Shipilev wrote: > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) ------------- PR: https://git.openjdk.org/jdk/pull/10590 From fyang at openjdk.org Fri Oct 14 01:44:14 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 01:44:14 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v4] In-Reply-To: References: Message-ID: > Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, > such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed > in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and > control assembler functions from address of type 'address' / 'Address'. > > Testing: Tier1 on HiFive Unmatched board {fastdebug, release}. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Fix code comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10697/files - new: https://git.openjdk.org/jdk/pull/10697/files/d98ab458..25e1baa2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=02-03 Stats: 6 lines in 1 file changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10697.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10697/head:pull/10697 PR: https://git.openjdk.org/jdk/pull/10697 From yadongwang at openjdk.org Fri Oct 14 01:47:05 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 14 Oct 2022 01:47:05 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v5] In-Reply-To: <33fbgEcGKSX560jIketZj2R2zR-t9B68NfuaJyI1ffA=.d324fd6a-e0c9-49f1-817a-88e2251b897f@github.com> References: <33fbgEcGKSX560jIketZj2R2zR-t9B68NfuaJyI1ffA=.d324fd6a-e0c9-49f1-817a-88e2251b897f@github.com> Message-ID: On Wed, 12 Oct 2022 09:21:54 GMT, Xiaolin Zheng wrote: >> This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. >> >> Chaining PR #10421. >> >> 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] >> 2. Performance: conservatively no regressions observed. [3] >> >> The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. >> >> >> Having tested several times hotspot tier1~tier4; Testing another turn on board. >> >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Keep aligning int32_t style lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10643 From yadongwang at openjdk.org Fri Oct 14 02:40:04 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 14 Oct 2022 02:40:04 GMT Subject: RFR: 8295009: RISC-V: Interpreter intrinsify Thread.currentThread() Message-ID: Calling intrinsic version of Thread.currentThread() in interpreter is ~30% faster on the Unmatched board: Before: Benchmark Mode Cnt Score Error Units MyBenchmark.testCurrentThread avgt 5 4665.765 ? 212.906 ns/op After: Benchmark Mode Cnt Score Error Units MyBenchmark.testCurrentThread avgt 5 3381.415 ? 223.005 ns/op Tier1 and jdk_loom have been tested on unmatched. ------------- Commit messages: - 8295009: RISC-V: Interpreter intrinsify Thread.currentThread() Changes: https://git.openjdk.org/jdk/pull/10709/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10709&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295009 Stats: 15 lines in 4 files changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10709.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10709/head:pull/10709 PR: https://git.openjdk.org/jdk/pull/10709 From xlinzheng at openjdk.org Fri Oct 14 03:00:59 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Fri, 14 Oct 2022 03:00:59 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 06:40:04 GMT, Fei Yang wrote: >> This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. >> >> Chaining PR #10421. >> >> 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] >> 2. Performance: conservatively no regressions observed. [3] >> >> The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. >> >> >> Having tested several times hotspot tier1~tier4; Testing another turn on board. >> >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html > > Personally, I prefer the following style: > > __ relocate(spec, [&] { > int32_t off = 0; > la_patchable(t0, RuntimeAddress(entry), off); > jalr(x1, t0, off); > }); > > Then the code looks more unified to me. And we don't need to extend a new la_patchable interface. Thanks for reviewing! @RealFYang @yadongw Will wait for #10697 to be merged first for it seems needs a redo if mine goes first. ------------- PR: https://git.openjdk.org/jdk/pull/10643 From fyang at openjdk.org Fri Oct 14 03:27:57 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 03:27:57 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v3] In-Reply-To: References: <2naeDigOwQ7G3M-0CYtEZsvcKoqR81-gfqkzqZ10gts=.4bd5f7c8-38c8-479d-8adc-d58c95b0cb75@github.com> Message-ID: On Thu, 13 Oct 2022 14:52:14 GMT, Feilong Jiang wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix > > Change looks good. Thanks. @feilongjiang @feilongjiang : Thanks for reviewing this non-trivial change. Need a Reviewer then. @shipilev : Want to take a look? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Fri Oct 14 04:02:59 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 04:02:59 GMT Subject: RFR: 8295009: RISC-V: Interpreter intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 02:32:28 GMT, Yadong Wang wrote: > Calling intrinsic version of Thread.currentThread() in interpreter is ~30% faster on the Unmatched board: > Before: > Benchmark Mode Cnt Score Error Units > MyBenchmark.testCurrentThread avgt 5 4665.765 ? 212.906 ns/op > After: > Benchmark Mode Cnt Score Error Units > MyBenchmark.testCurrentThread avgt 5 3381.415 ? 223.005 ns/op > > Tier1 and jdk_loom have been tested on unmatched. LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10709 From shade at openjdk.org Fri Oct 14 06:13:04 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Oct 2022 06:13:04 GMT Subject: RFR: 8295009: RISC-V: Interpreter intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 02:32:28 GMT, Yadong Wang wrote: > Calling intrinsic version of Thread.currentThread() in interpreter is ~30% faster on the Unmatched board: > Before: > Benchmark Mode Cnt Score Error Units > MyBenchmark.testCurrentThread avgt 5 4665.765 ? 212.906 ns/op > After: > Benchmark Mode Cnt Score Error Units > MyBenchmark.testCurrentThread avgt 5 3381.415 ? 223.005 ns/op > > Tier1 and jdk_loom have been tested on unmatched. Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/10709 From shade at openjdk.org Fri Oct 14 06:17:28 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 14 Oct 2022 06:17:28 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v4] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 01:44:14 GMT, Fei Yang wrote: >> Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, >> such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed >> in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and >> control assembler functions from address of type 'address' / 'Address'. >> >> Testing: Tier1 on HiFive Unmatched board {fastdebug, release}. > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Fix code comments Looks okay from a brief look. src/hotspot/cpu/riscv/assembler_riscv.hpp line 2417: > 2415: bool do_compress() const { > 2416: return UseRVC && in_compressible_region(); > 2417: } Indenting had been lost. src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 480: > 478: void bge(Register Rs1, Register Rs2, Label &L, bool is_far = false); > 479: void bltu(Register Rs1, Register Rs2, Label &L, bool is_far = false); > 480: void bgeu(Register Rs1, Register Rs2, Label &L, bool is_far = false); Do you want to line things like these up? For example: void beq (Register Rs1, Register Rs2, Label &L, bool is_far = false); void bne (Register Rs1, Register Rs2, Label &L, bool is_far = false); void blt (Register Rs1, Register Rs2, Label &L, bool is_far = false); void bge (Register Rs1, Register Rs2, Label &L, bool is_far = false); void bltu(Register Rs1, Register Rs2, Label &L, bool is_far = false); void bgeu(Register Rs1, Register Rs2, Label &L, bool is_far = false); ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/10697 From rehn at openjdk.org Fri Oct 14 06:45:08 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 14 Oct 2022 06:45:08 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v4] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 10:34:04 GMT, Roman Kennke wrote: > It used to be very stable before that. I have backed out that change, can you try again? Seems fine now, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From fyang at openjdk.org Fri Oct 14 07:11:49 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 07:11:49 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v5] In-Reply-To: References: Message-ID: <0e5g2-n_YEABkppANoSeWVfWcTK3evsj7jepa4Fy8JY=.fa62b8fa-287c-4111-8002-76b5a005edfc@github.com> > Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, > such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed > in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and > control assembler functions from address of type 'address' / 'Address'. > > Testing: Tier1 on HiFive Unmatched board {fastdebug, release}. Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10697/files - new: https://git.openjdk.org/jdk/pull/10697/files/25e1baa2..bc2e56ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10697&range=03-04 Stats: 20 lines in 2 files changed: 4 ins; 1 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/10697.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10697/head:pull/10697 PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Fri Oct 14 07:11:51 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 07:11:51 GMT Subject: RFR: 8295270: RISC-V: Clean up and refactoring for assembler functions [v4] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 06:14:26 GMT, Aleksey Shipilev wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix code comments > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 480: > >> 478: void bge(Register Rs1, Register Rs2, Label &L, bool is_far = false); >> 479: void bltu(Register Rs1, Register Rs2, Label &L, bool is_far = false); >> 480: void bgeu(Register Rs1, Register Rs2, Label &L, bool is_far = false); > > Do you want to line things like these up? For example: > > > > void beq (Register Rs1, Register Rs2, Label &L, bool is_far = false); > void bne (Register Rs1, Register Rs2, Label &L, bool is_far = false); > void blt (Register Rs1, Register Rs2, Label &L, bool is_far = false); > void bge (Register Rs1, Register Rs2, Label &L, bool is_far = false); > void bltu(Register Rs1, Register Rs2, Label &L, bool is_far = false); > void bgeu(Register Rs1, Register Rs2, Label &L, bool is_far = false); Yes, that looks better. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10697 From fyang at openjdk.org Fri Oct 14 07:58:06 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 07:58:06 GMT Subject: Integrated: 8295270: RISC-V: Clean up and refactoring for assembler functions In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 12:17:00 GMT, Fei Yang wrote: > Witnessed that some high-level assember functions are placed in file assembler_riscv.hpp/cpp, > such as 'movptr', 'li' and so on. These are macro-assembler functions which should be placed > in macroAssembler_riscv.hpp/cpp. Meanwhile, we should also move load & store memory and > control assembler functions from address of type 'address' / 'Address'. > > Testing: Tier1 on HiFive Unmatched board {fastdebug, release}. This pull request has now been integrated. Changeset: 3d75e88e Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/3d75e88eb25f56ed2214496826004578c2c75012 Stats: 1354 lines in 12 files changed: 641 ins; 603 del; 110 mod 8295270: RISC-V: Clean up and refactoring for assembler functions Reviewed-by: fjiang, yadongwang, shade ------------- PR: https://git.openjdk.org/jdk/pull/10697 From jnordstrom at openjdk.org Fri Oct 14 08:52:11 2022 From: jnordstrom at openjdk.org (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Fri, 14 Oct 2022 08:52:11 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: > Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. > > # Testing > - jdk_jfr Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events ------------- Changes: - all: https://git.openjdk.org/jdk/pull/8883/files - new: https://git.openjdk.org/jdk/pull/8883/files/755d2359..a1a95ed9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=8883&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=8883&range=00-01 Stats: 507955 lines in 7712 files changed: 264969 ins; 180116 del; 62870 mod Patch: https://git.openjdk.org/jdk/pull/8883.diff Fetch: git fetch https://git.openjdk.org/jdk pull/8883/head:pull/8883 PR: https://git.openjdk.org/jdk/pull/8883 From xlinzheng at openjdk.org Fri Oct 14 09:21:05 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Fri, 14 Oct 2022 09:21:05 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v6] In-Reply-To: References: Message-ID: > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Merge branch 'master' into riscv-rvc-checkin-second-half-part ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10643/files - new: https://git.openjdk.org/jdk/pull/10643/files/83f3598a..0989bbc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=04-05 Stats: 11417 lines in 239 files changed: 7643 ins; 2386 del; 1388 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From xlinzheng at openjdk.org Fri Oct 14 10:46:44 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Fri, 14 Oct 2022 10:46:44 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v7] In-Reply-To: References: Message-ID: > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge remote-tracking branch 'jdk20/master' into riscv-rvc-checkin-second-half-part - Merge branch 'master' into riscv-rvc-checkin-second-half-part - Keep aligning int32_t style - remove a dummy line, and a simple polish by the way - swap the order - Change the style as to comments - [7] Blacklist mode - [6] RVC: IncompressibleRegions for relocations ------------- Changes: https://git.openjdk.org/jdk/pull/10643/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10643&range=06 Stats: 312 lines in 12 files changed: 148 ins; 3 del; 161 mod Patch: https://git.openjdk.org/jdk/pull/10643.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10643/head:pull/10643 PR: https://git.openjdk.org/jdk/pull/10643 From jnordstrom at openjdk.org Fri Oct 14 12:08:12 2022 From: jnordstrom at openjdk.org (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Fri, 14 Oct 2022 12:08:12 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events I've re-opened this, and made slight alterations after suggestions from @mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/8883 From fyang at openjdk.org Fri Oct 14 12:23:08 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 12:23:08 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v7] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 10:46:44 GMT, Xiaolin Zheng wrote: >> This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. >> >> Chaining PR #10421. >> >> 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] >> 2. Performance: conservatively no regressions observed. [3] >> >> The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. >> >> >> Having tested several times hotspot tier1~tier4; Testing another turn on board. >> >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html > > Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge remote-tracking branch 'jdk20/master' into riscv-rvc-checkin-second-half-part > - Merge branch 'master' into riscv-rvc-checkin-second-half-part > - Keep aligning int32_t style > - remove a dummy line, and a simple polish by the way > - swap the order > - Change the style as to comments > - [7] Blacklist mode > - [6] RVC: IncompressibleRegions for relocations Thanks for rebasing this. Still looks good. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10643 From egahlin at openjdk.org Fri Oct 14 12:34:59 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 14 Oct 2022 12:34:59 GMT Subject: RFR: 8295274: HelidonAppTest.java fails "assert(event->should_commit()) failed: invariant" from compiled frame" In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 14:38:59 GMT, Markus Gr?nlund wrote: > Greetings, > > In [JDK-8287832](https://bugs.openjdk.org/browse/JDK-8287832), a change was made that removed the cached shouldCommit() state, under the premise that shouldCommit() is only called once. There are a few places that assert on shouldCommit(), that should have been removed as part of [JDK-8287832](https://bugs.openjdk.org/browse/JDK-8287832). > > This change complements these removals. > > Testing: jdk_jfr > > Thanks > Markus Marked as reviewed by egahlin (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10700 From mgronlun at openjdk.org Fri Oct 14 12:38:07 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 14 Oct 2022 12:38:07 GMT Subject: Integrated: 8295274: HelidonAppTest.java fails "assert(event->should_commit()) failed: invariant" from compiled frame" In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 14:38:59 GMT, Markus Gr?nlund wrote: > Greetings, > > In [JDK-8287832](https://bugs.openjdk.org/browse/JDK-8287832), a change was made that removed the cached shouldCommit() state, under the premise that shouldCommit() is only called once. There are a few places that assert on shouldCommit(), that should have been removed as part of [JDK-8287832](https://bugs.openjdk.org/browse/JDK-8287832). > > This change complements these removals. > > Testing: jdk_jfr > > Thanks > Markus This pull request has now been integrated. Changeset: 21e4f06a Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/21e4f06ada24098dad4e71b0f9c13afeff87c24b Stats: 4 lines in 4 files changed: 0 ins; 4 del; 0 mod 8295274: HelidonAppTest.java fails "assert(event->should_commit()) failed: invariant" from compiled frame" Reviewed-by: egahlin ------------- PR: https://git.openjdk.org/jdk/pull/10700 From egahlin at openjdk.org Fri Oct 14 13:14:10 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 14 Oct 2022 13:14:10 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: <4FmscqGrdyJjWKjIoEulvWOyAoIwCanjyXS5jSp2_og=.4a1fef5a-6a53-4de3-b0b3-0cbaa4341da6@github.com> On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events Marked as reviewed by egahlin (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/8883 From duke at openjdk.org Fri Oct 14 13:39:42 2022 From: duke at openjdk.org (Joshua Cao) Date: Fri, 14 Oct 2022 13:39:42 GMT Subject: Integrated: 8295288: Some vm_flags tests associate with a wrong BugID In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 21:07:01 GMT, Joshua Cao wrote: > Fixing incorrect bug IDs. IntxTest.java already has the correct bug ID, and SizeTTest has a different bug ID that is correct. > > All changed tests passing locally. This pull request has now been integrated. Changeset: 3dbc38a2 Author: Joshua Cao Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/3dbc38a2c903f533ace847a3bc0d2687f263fafd Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod 8295288: Some vm_flags tests associate with a wrong BugID Reviewed-by: phh ------------- PR: https://git.openjdk.org/jdk/pull/10703 From fyang at openjdk.org Fri Oct 14 13:47:13 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 13:47:13 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: Message-ID: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> On Fri, 14 Oct 2022 01:19:27 GMT, Fei Yang wrote: > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Fri Oct 14 14:30:00 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 14 Oct 2022 14:30:00 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> References: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> Message-ID: On Fri, 14 Oct 2022 13:45:07 GMT, Fei Yang wrote: > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? > > > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From fyang at openjdk.org Fri Oct 14 14:35:07 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 14:35:07 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> Message-ID: <9KWs3-ICjuSPKWkcn-hTz0V2rMUrn8B6aqmE2spm5es=.cc94175e-a8f9-468a-991a-656ee2c8c581@github.com> On Fri, 14 Oct 2022 14:26:20 GMT, Roman Kennke wrote: > > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? > > > > > > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) > > > > > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? > > Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. Yes, the whole PR: https://patch-diff.githubusercontent.com/raw/openjdk/jdk/pull/10590.diff ------------- PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at amazon.de Fri Oct 14 14:36:42 2022 From: rkennke at amazon.de (Kennke, Roman) Date: Fri, 14 Oct 2022 14:36:42 +0000 Subject: RFC: Draft JEP: 64 bit object headers Message-ID: <640D82DD-1774-4EEA-8D47-48E09AB42EAA@amazon.de> Hello all, I have created a draft JEP for 64 bit object headers. See the Jira issue for details: https://bugs.openjdk.org/browse/JDK-8294992 Comments and feedback are welcome! Thanks, Roman Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 From rkennke at openjdk.org Fri Oct 14 14:41:07 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 14 Oct 2022 14:41:07 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: <9KWs3-ICjuSPKWkcn-hTz0V2rMUrn8B6aqmE2spm5es=.cc94175e-a8f9-468a-991a-656ee2c8c581@github.com> References: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> <9KWs3-ICjuSPKWkcn-hTz0V2rMUrn8B6aqmE2spm5es=.cc94175e-a8f9-468a-991a-656ee2c8c581@github.com> Message-ID: On Fri, 14 Oct 2022 14:32:57 GMT, Fei Yang wrote: > > > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? > > > > > > > > > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) > > > > > > > > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? > > > > > > Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. > > Yes, the whole PR: https://patch-diff.githubusercontent.com/raw/openjdk/jdk/pull/10590.diff The PR reports a merge conflict in risc-v code, when applied vs latest tip. Have you resolved that? GHA (which includes risc-v) is happy, otherwise. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From fyang at openjdk.org Fri Oct 14 14:56:11 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 14 Oct 2022 14:56:11 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> <9KWs3-ICjuSPKWkcn-hTz0V2rMUrn8B6aqmE2spm5es=.cc94175e-a8f9-468a-991a-656ee2c8c581@github.com> Message-ID: On Fri, 14 Oct 2022 14:39:01 GMT, Roman Kennke wrote: > > > > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > > > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? > > > > > > > > > > > > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) > > > > > > > > > > > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? > > > > > > > > > Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. > > > > > > Yes, the whole PR: https://patch-diff.githubusercontent.com/raw/openjdk/jdk/pull/10590.diff > > The PR reports a merge conflict in risc-v code, when applied vs latest tip. Have you resolved that? GHA (which includes risc-v) is happy, otherwise. @rkennke : I did see some "Hunk succeeded" messages for the risc-v part when applying the change with: $ patch -p1 < ~/10590.diff But I didn't check whether that will cause a problem here. patching file src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp patching file src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp patching file src/hotspot/cpu/riscv/c1_LIRGenerator_riscv.cpp patching file src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp Hunk #1 succeeded at 58 (offset -1 lines). Hunk #2 succeeded at 67 (offset -1 lines). patching file src/hotspot/cpu/riscv/c1_Runtime1_riscv.cpp patching file src/hotspot/cpu/riscv/interp_masm_riscv.cpp patching file src/hotspot/cpu/riscv/macroAssembler_riscv.cpp Hunk #1 succeeded at 2499 (offset 324 lines). Hunk #2 succeeded at 4474 (offset 330 lines). patching file src/hotspot/cpu/riscv/macroAssembler_riscv.hpp Hunk #1 succeeded at 869 with fuzz 2 (offset 313 lines). Hunk #2 succeeded at 1252 (offset 325 lines). patching file src/hotspot/cpu/riscv/riscv.ad Hunk #1 succeeded at 2385 (offset 7 lines). Hunk #2 succeeded at 2407 (offset 7 lines). Hunk #3 succeeded at 2433 (offset 7 lines). Hunk #4 succeeded at 10403 (offset 33 lines). Hunk #5 succeeded at 10417 (offset 33 lines). patching file src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp Hunk #1 succeeded at 975 (offset 21 lines). Hunk #2 succeeded at 1030 (offset 21 lines). Hunk #3 succeeded at 1042 (offset 21 lines). Hunk #4 succeeded at 1058 (offset 21 lines). Hunk #5 succeeded at 1316 (offset 24 lines). Hunk #6 succeeded at 1416 (offset 24 lines). Hunk #7 succeeded at 1492 (offset 24 lines). Hunk #8 succeeded at 1517 (offset 24 lines). Hunk #9 succeeded at 1621 (offset 24 lines). ------------- PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Fri Oct 14 15:42:01 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 14 Oct 2022 15:42:01 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: References: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> <9KWs3-ICjuSPKWkcn-hTz0V2rMUrn8B6aqmE2spm5es=.cc94175e-a8f9-468a-991a-656ee2c8c581@github.com> Message-ID: <2abWu-ITUoN-hNBTy6f0qQN-Q5XuAF3XXbTe7Kz63iU=.350a2155-f2ef-4909-98d8-350306413f74@github.com> On Fri, 14 Oct 2022 14:53:57 GMT, Fei Yang wrote: > > > > > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch > > > > > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? > > > > > > > > > > > > > > > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) > > > > > > > > > > > > > > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? > > > > > > > > > > > > Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. > > > > > > > > > Yes, the whole PR: https://patch-diff.githubusercontent.com/raw/openjdk/jdk/pull/10590.diff > > > > > > The PR reports a merge conflict in risc-v code, when applied vs latest tip. Have you resolved that? GHA (which includes risc-v) is happy, otherwise. > > @rkennke : I did see some "Hunk succeeded" messages for the risc-v part when applying the change with: $ patch -p1 < ~/10590.diff But I didn't check whether that will cause a problem here. If you take the latest code from this PR, it would already have the patch applied. No need to patch it again. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From duke at openjdk.org Fri Oct 14 21:38:22 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 14 Oct 2022 21:38:22 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions Message-ID: Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. - Added a JMH perf test. - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. Perf before: Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s and after: Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s ------------- Commit messages: - missed white-space fix - - Fix whitespace and copyright statements - Merge remote-tracking branch 'vpaprotsk/master' into avx512-poly - Poly1305 AVX512 intrinsic for x86_64 Changes: https://git.openjdk.org/jdk/pull/10582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288047 Stats: 1676 lines in 29 files changed: 1665 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 14 21:38:22 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 14 Oct 2022 21:38:22 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s I am part of Intel Java Team ------------- PR: https://git.openjdk.org/jdk/pull/10582 From kbarrett at openjdk.org Sat Oct 15 08:14:09 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 15 Oct 2022 08:14:09 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: <5CNdaZqYt3cnZZBbs9QDN4K7ebprUsUV3W4-D0m32lA=.d46ec09f-97ea-4656-a3fc-78ac5190e76b@github.com> On Wed, 12 Oct 2022 11:04:14 GMT, Magnus Ihse Bursie wrote: >> After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. >> >> Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. >> >> Some warnings didn't trigger in any file anymore, and could just be removed. >> >> Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. >> >> I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. >> >> I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. >> >> It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). >> >> Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Github workflow changes were not supposed to be in this PR... Looks good. ------------- PR: https://git.openjdk.org/jdk/pull/10414 From kbarrett at openjdk.org Sat Oct 15 08:14:10 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 15 Oct 2022 08:14:10 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 13:42:39 GMT, Magnus Ihse Bursie wrote: >> make/hotspot/lib/CompileJvm.gmk line 92: >> >>> 90: >>> 91: DISABLED_WARNINGS_clang := ignored-qualifiers sometimes-uninitialized \ >>> 92: missing-braces delete-non-abstract-non-virtual-dtor unknown-pragmas >> >> Shouldn't shift-negative-value be in the clang list too? > > Well, there is currently no instance of clang complaining about this. This could be due to: > > * This warning does not really exist on clang > * Or it is not enabled by our current clang flags > * Or the code which triggers the warning in gcc is not reached by clang > * Or clang is smarter than gcc and can determine that the usage is ok after all > * Or clang is dumber than gcc and does not even see that there could have been a problem... > > ... > > I'm kind of reluctant to add warnings to this list that have not occurred for real. My suggestion is that we add it here if we ever see it making incorrect claims. Ok? I'm worried about someone encountering it and uglifying code to work around it. OTOH, I don't know why we're not seeing this warning with clang, as there is shared code that should trigger it (and does for gcc). So I'm okay with it as is. ------------- PR: https://git.openjdk.org/jdk/pull/10414 From stuefe at openjdk.org Sat Oct 15 13:38:39 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 15 Oct 2022 13:38:39 GMT Subject: RFR: JDK-8293114: GC should trim the native heap [v2] In-Reply-To: References: <23KpPM4oPV6F1nz3g5CvIqvuX-ANcsMH4GuVNXjR-Lw=.b8d0fa2d-bb85-4899-8e21-f68ea64b988d@github.com> Message-ID: On Thu, 1 Sep 2022 06:47:27 GMT, Thomas Stuefe wrote: >> This RFE adds the option to auto-trim the Glibc heap as part of the GC cycle. If the VM process suffered high temporary malloc spikes (regardless whether from JVM- or user code), this could recover significant amounts of memory. >> >> We discussed this a year ago [1], but the item got pushed to the bottom of my work pile, therefore it took longer than I thought. >> >> ### Motivation >> >> The Glibc allocator is reluctant to return memory to the OS, much more so than other allocators. Temporary malloc spikes often carry over as permanent RSS increase. >> >> Note that C-heap retention is difficult to observe. Since it is freed memory, it won't show up in NMT, it is just a part of private RSS. >> >> Theoretically, retained memory is not lost since it will be reused by future mallocs. Retaining memory is therefore a bet on the future behavior of the app. The allocator bets on the application needing memory in the near future, and to satisfy that need via malloc. >> >> But an app's malloc load can fluctuate wildly, with temporary spikes and long idle periods. And if the app rolls its own custom allocators atop of mmap, as hotspot does, a lot of that memory cannot be reused even though it counts toward its memory footprint. >> >> To help, Glibc exports an API to trim the C-heap: `malloc_trim(3)`. With JDK 18 [2], SAP contributed a new jcmd command to *manually* trim the C-heap on Linux. This RFE adds a complementary way to trim automatically. >> >> #### Is this even a problem? >> >> Do we have high malloc spikes in the JVM process? We assume that malloc load from hotspot is usually low since hotspot typically clusters allocations into custom areas - metaspace, code heap, arenas. >> >> But arenas are subject to Glibc mem retention too. I was surprised by that since I assumed 32k arena chunks were too big to be subject of Glibc retention. But I saw in experiments that high arena peaks often cause lasting RSS increase. >> >> And of course, both hotspot and JDK do a lot of finer-granular mallocs outside of custom allocators. >> >> But many cases of high memory retention in Glibc I have seen in third-party JNI code. Libraries allocate large buffers via malloc as temporary buffers. In fact, since we introduced the jcmd "System.trim_native_heap", some of our customers started to call this command periodically in scripts to counter these issues. >> >> Therefore I think while high malloc spikes are atypical for a JVM process, they can happen. Having a way to auto-trim the native heap makes sense. >> >> ### When should we trim? >> >> We want to trim when we know there is a lull in malloc activity coming. But we have no knowledge of the future. >> >> We could build a heuristic based on malloc frequency. But on closer inspection that is difficult. We cannot use NMT, since NMT has no complete picture (only knows hotspot) and is usually disabled in production anyway. The only way to get *all* mallocs would be to use Glibc malloc hooks. We have done so in desperate cases at SAP, but Glibc removed malloc hooks in 2.35. It would be a messy solution anyway; best to avoid it. >> >> The next best thing is synchronizing with the larger C-heap users in the VM: compiler and GC. But compiler turns out not to be such a problem, since the compiler uses arenas, and arena chunks are buffered in a free pool with a five-second delay. That means compiler activity that happens in bursts, like at VM startup, will just shuffle arena chunks around from/to the arena free pool, never bothering to call malloc or free. >> >> That leaves the GC, which was also the experts' recommendation in last year's discussion [1]. Most GCs do uncommit, and trimming the native heap fits well into this. And we want to time the trim to not get into the way of a GC. Plus, integrating trims into the GC cycle lets us reuse GC logging and timing, thereby making RSS changes caused by trim-native visible to the analyst. >> >> >> ### How it works: >> >> Patch adds new options (experimental for now, and shared among all GCs): >> >> >> -XX:+GCTrimNativeHeap >> -XX:GCTrimNativeHeapInterval= (defaults to 60) >> >> >> `GCTrimNativeHeap` is off by default. If enabled, it will cause the VM to trim the native heap on full GCs as well as periodically. The period is defined by `GCTrimNativeHeapInterval`. Periodic trimming can be completely switched off with `GCTrimNativeHeapInterval=0`; in that case, we will only trim on full GCs. >> >> ### Examples: >> >> This is an artificial test that causes two high malloc spikes with long idle periods. Observe how RSS recovers with trim but stays up without trim. The trim interval was set to 15 seconds for the test, and no GC was invoked here; this is periodic trimming. >> >> ![alloc-test](http://cr.openjdk.java.net/~stuefe/other/autotrim/rss-all-collectors.png) >> >> (See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-all.sh) ) >> >> Spring pet clinic boots up, then idles. Once with, once without trim, with the trim interval at 60 seconds default. Of course, if it were actually doing something instead of idling, trim effects would be smaller. But the point of trimming is to recover memory in idle periods. >> >> ![petclinic bootup](http://cr.openjdk.java.net/~stuefe/other/autotrim/spring-petclinic-rss-with-and-without-trim.png)) >> >> (See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-petclinic-boot.sh) ) >> >> >> >> ### Implementation >> >> One problem I faced when implementing this was that trimming was non-interruptable. GCs usually split the uncommit work into smaller portions, which is impossible for `malloc_trim()`. >> >> So very slow trims could introduce longer GC pauses. I did not want this, therefore I implemented two ways to trim: >> 1) GCs can opt to trim asynchronously. In that case, a `NativeTrimmer` thread runs on behalf of the GC and takes care of all trimming. The GC just nudges the `NativeTrimmer` at the end of its GC cycle, but the trim itself runs concurrently. >> 2) GCs can do the trim inside their own thread, synchronously. It will have to wait until the trim is done. >> >> (1) has the advantage of giving us periodic trims even without GC activity (Shenandoah does this out of the box). >> >> #### Serial >> >> Serial does the trimming synchronously as part of a full GC, and only then. I did not want to spawn a separate thread for the SerialGC. Therefore Serial is the only GC that does not offer periodic trimming, it just trims on full GC. >> >> #### Parallel, G1, Z >> >> All of them do the trimming asynchronously via `NativeTrimmer`. They schedule the native trim at the end of a full collection. They also pause the trimming at the beginning of a cycle to not trim during GCs. >> >> #### Shenandoah >> >> Shenandoah does the trimming synchronously in its service thread, similar to how it handles uncommits. Since the service thread already runs concurrently and continuously, it can do periodic trimming; no need to spin a new thread. And this way we can reuse the Shenandoah timing classes. >> >> ### Patch details >> >> - adds three new functions to the `os` namespace: >> - `os::trim_native_heap()` implementing trim >> - `os::can_trim_native_heap()` and `os::should_trim_native_heap()` to return whether platform supports trimming resp. whether the platform considers trimming to be useful. >> - replaces implementation of the cmd "System.trim_native_heap" with the new `os::trim_native_heap` >> - provides a new wrapper function wrapping the tedious `mallinfo()` vs `mallinfo2()` business: `os::Linux::get_mallinfo()` >> - adds a GC-shared utility class, `GCTrimNative`, that takes care of trimming and GC-logging and houses the `NativeTrimmer` thread class. >> - adds a regression test >> >> >> ### Tests >> >> Tested older Glibc (2.31), and newer Glibc (2.35) (`mallinfo()` vs` mallinfo2()`), on Linux x64. >> >> The rest of the tests will be done by GHA and in our SAP nightlies. >> >> >> ### Remarks >> >> #### How about other allocators? >> >> I have seen this retention problem mainly with the Glibc and the AIX libc. Muslc returns memory more eagerly to the OS. I also tested with jemalloc and found it also reclaims more aggressively, therefore I don't think MacOS or BSD are affected that much by retention either. >> >> #### Trim costs? >> >> Trim-native is a tradeoff between memory and performance. We pay >> - The cost to do the trim depends on how much is trimmed. Time ranges on my machine between < 1ms for no-op trims, to ~800ms for 32GB trims. >> - The cost for re-acquiring the memory, should the memory be needed again, is the second cost factor. >> >> #### Predicting malloc_trim effects? >> >> `ShenandoahUncommit` avoids uncommits if they are not necessary, thus avoiding work and gc log spamming. I liked that and tried to follow that example. Tried to devise a way to predict the effect trim could have based on allocator info from mallinfo(3). That was quite frustrating since the documentation was confusing and I had to do a lot of experimenting. In the end, I came up with a heuristic to prevent obviously pointless trim attempts; see `os::should_trim_native_heap()`. I am not completely happy with it. >> >> #### glibc.malloc.trim_threshold? >> >> glibc has a tunable that looks like it could influence the willingness of Glibc to return memory to the OS, the "trim_threshold". In practice, I could not get it to do anything useful. Regardless of the setting, it never seemed to influence the trimming behavior. Even if it would work, I'm not sure we'd want to use that, since by doing malloc_trim manually we can space out the trims as we see fit, instead of paying the trim price for free(3). >> >> >> - [1] https://mail.openjdk.org/pipermail/hotspot-dev/2021-August/054323.html >> - [2] https://bugs.openjdk.org/browse/JDK-8269345 > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - reduce test runtime on slow hardware > - make tests more stable on slow hardware Not yet bot ------------- PR: https://git.openjdk.org/jdk/pull/10085 From luhenry at openjdk.org Sat Oct 15 14:30:27 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Sat, 15 Oct 2022 14:30:27 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V Message-ID: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. [1] https://github.com/riscv/riscv-CMOs [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero ------------- Commit messages: - 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V Changes: https://git.openjdk.org/jdk/pull/10718/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295282 Stats: 114 lines in 8 files changed: 112 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From kbarrett at openjdk.org Sun Oct 16 05:36:03 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 16 Oct 2022 05:36:03 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v6] In-Reply-To: References: Message-ID: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> > 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal > 8155996: Improve concurrent refinement green zone control > 8134303: Introduce -XX:-G1UseConcRefinement > > Please review this change to the control of concurrent refinement. > > This new controller takes a different approach to the problem, addressing a > number of issues. > > The old controller used a multiple of the target number of cards to determine > the range over which increasing numbers of refinement threads should be > activated, and finally activating mutator refinement. This has a variety of > problems. It doesn't account for the processing rate, the rate of new dirty > cards, or the time available to perform the processing. This often leads to > unnecessary spikes in the number of running refinement threads. It also tends > to drive the pending number to the target quickly and keep it there, removing > the benefit from having pending dirty cards filter out new cards for nearby > writes. It can't delay and leave excess cards in the queue because it could > be a long time before another buffer is enqueued. > > The old controller was triggered by mutator threads enqueing card buffers, > when the number of cards in the queue exceeded a threshold near the target. > This required a complex activation protocol between the mutators and the > refinement threads. > > With the new controller there is a primary refinement thread that periodically > estimates how many refinement threads need to be running to reach the target > in time for the next GC, along with whether to also activate mutator > refinement. If the primary thread stops running because it isn't currently > needed, it sleeps for a period and reevaluates on wakeup. This eliminates any > involvement in the activation of refinement threads by mutator threads. > > The estimate of how many refinement threads are needed uses a prediction of > time until the next GC, the number of buffered cards, the predicted rate of > new dirty cards, and the predicted refinement rate. The number of running > threads is adjusted based on these periodically performed estimates. > > This new approach allows more dirty cards to be left in the queue until late > in the mutator phase, typically reducing the rate of new dirty cards, which > reduces the amount of concurrent refinement work needed. > > It also smooths out the number of running refinement threads, eliminating the > unnecessarily large spikes that are common with the old method. One benefit > is that the number of refinement threads (lazily) allocated is often much > lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem > described in JDK-8153225.) > > This change also provides a new method for calculating for the number of dirty > cards that should be pending at the start of a GC. While this calculation is > conceptually distinct from the thread control, the two were significanly > intertwined in the old controller. Changing this calculation separately and > first would have changed the behavior of the old controller in ways that might > have introduced regressions. Changing it after the thread control was changed > would have made it more difficult to test and measure the thread control in a > desirable configuration. > > The old calculation had various problems that are described in JDK-8155996. > In particular, it can get more or less stuck at low values, and is slow to > respond to changes. > > The old controller provided a number of product options, none of which were > very useful for real applications, and none of which are very applicable to > the new controller. All of these are being obsoleted. > > -XX:-G1UseAdaptiveConcRefinement > -XX:G1ConcRefinementGreenZone= > -XX:G1ConcRefinementYellowZone= > -XX:G1ConcRefinementRedZone= > -XX:G1ConcRefinementThresholdStep= > > The new controller *could* use G1ConcRefinementGreenZone to provide a fixed > value for the target number of cards, though it is poorly named for that. > > A configuration that was useful for some kinds of debugging and testing was to > disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a > very large value, effectively disabling concurrent refinement. To support > this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic > option has been added (see JDK-8155996). > > The other options are meaningless for the new controller. > > Because of these option changes, a CSR and a release note need to accompany > this change. > > Testing: > mach5 tier1-6 > various performance tests. > local (linux-x64) tier1 with -XX:-G1UseConcRefinement > > Performance testing found no regressions, but also little or no improvement > with default options, which was expected. With default options most of our > performance tests do very little concurrent refinement. And even for those > that do, while the old controller had a number of problems, the impact of > those problems is small and hard to measure for most applications. > > When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare > better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with > MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options > held constant) showed a statistically significant improvement of about 4.5% > for critical-jOPS. Using the changed controller, the difference between this > configuration and the default is fairly small, while the baseline shows > significant degradation with the more restrictive options. > > For all tests and configurations the new controller often creates many fewer > refinement threads. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - adjust young target length periodically - use cards in thread-buffers when revising young list target length - remove remset sampler - move remset-driven young-gen resizing - fix type of predict_dirtied_cards_in_threa_buffers - Merge branch 'master' into crt2 - comments around alloc_bytes_rate being zero - tschatzl comments - changed threads wanted logging per kstefanj - s/max_cards/mutator_refinement_threshold/ - ... and 12 more: https://git.openjdk.org/jdk/compare/8487c56f...1631a61a ------------- Changes: https://git.openjdk.org/jdk/pull/10256/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=05 Stats: 1605 lines in 24 files changed: 665 ins; 662 del; 278 mod Patch: https://git.openjdk.org/jdk/pull/10256.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10256/head:pull/10256 PR: https://git.openjdk.org/jdk/pull/10256 From kbarrett at openjdk.org Sun Oct 16 05:36:03 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 16 Oct 2022 05:36:03 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3] In-Reply-To: References: Message-ID: On Tue, 27 Sep 2022 12:44:43 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: >> >> - wanted vs needed nomenclature >> - remove several spurious "scan" >> - delay => wait_time_ms > > Some typos. Going to do some testing. @tschatzl did some additional performance testing and found a regression. The periodic remset sampling with associated young gen target length adjustment can interact poorly with this change. This change tries to be lazy about performing the concurrent refinement, delaying to late in the mutator phase. Because the target length adjustment happens relatively infrequently (300ms period by default), it may make a large change. But if the length is reduced by enough to make a GC needed soon, it may be too late to do much of the needed refinement. This can lead to significantly increased pause time. The solution is to move the target length adjustment into the primary refinement thread. Currently doing more perf testing. ------------- PR: https://git.openjdk.org/jdk/pull/10256 From iwalulya at openjdk.org Sun Oct 16 07:29:57 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Sun, 16 Oct 2022 07:29:57 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v6] In-Reply-To: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> References: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> Message-ID: On Sun, 16 Oct 2022 05:36:03 GMT, Kim Barrett wrote: >> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal >> 8155996: Improve concurrent refinement green zone control >> 8134303: Introduce -XX:-G1UseConcRefinement >> >> Please review this change to the control of concurrent refinement. >> >> This new controller takes a different approach to the problem, addressing a >> number of issues. >> >> The old controller used a multiple of the target number of cards to determine >> the range over which increasing numbers of refinement threads should be >> activated, and finally activating mutator refinement. This has a variety of >> problems. It doesn't account for the processing rate, the rate of new dirty >> cards, or the time available to perform the processing. This often leads to >> unnecessary spikes in the number of running refinement threads. It also tends >> to drive the pending number to the target quickly and keep it there, removing >> the benefit from having pending dirty cards filter out new cards for nearby >> writes. It can't delay and leave excess cards in the queue because it could >> be a long time before another buffer is enqueued. >> >> The old controller was triggered by mutator threads enqueing card buffers, >> when the number of cards in the queue exceeded a threshold near the target. >> This required a complex activation protocol between the mutators and the >> refinement threads. >> >> With the new controller there is a primary refinement thread that periodically >> estimates how many refinement threads need to be running to reach the target >> in time for the next GC, along with whether to also activate mutator >> refinement. If the primary thread stops running because it isn't currently >> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any >> involvement in the activation of refinement threads by mutator threads. >> >> The estimate of how many refinement threads are needed uses a prediction of >> time until the next GC, the number of buffered cards, the predicted rate of >> new dirty cards, and the predicted refinement rate. The number of running >> threads is adjusted based on these periodically performed estimates. >> >> This new approach allows more dirty cards to be left in the queue until late >> in the mutator phase, typically reducing the rate of new dirty cards, which >> reduces the amount of concurrent refinement work needed. >> >> It also smooths out the number of running refinement threads, eliminating the >> unnecessarily large spikes that are common with the old method. One benefit >> is that the number of refinement threads (lazily) allocated is often much >> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem >> described in JDK-8153225.) >> >> This change also provides a new method for calculating for the number of dirty >> cards that should be pending at the start of a GC. While this calculation is >> conceptually distinct from the thread control, the two were significanly >> intertwined in the old controller. Changing this calculation separately and >> first would have changed the behavior of the old controller in ways that might >> have introduced regressions. Changing it after the thread control was changed >> would have made it more difficult to test and measure the thread control in a >> desirable configuration. >> >> The old calculation had various problems that are described in JDK-8155996. >> In particular, it can get more or less stuck at low values, and is slow to >> respond to changes. >> >> The old controller provided a number of product options, none of which were >> very useful for real applications, and none of which are very applicable to >> the new controller. All of these are being obsoleted. >> >> -XX:-G1UseAdaptiveConcRefinement >> -XX:G1ConcRefinementGreenZone= >> -XX:G1ConcRefinementYellowZone= >> -XX:G1ConcRefinementRedZone= >> -XX:G1ConcRefinementThresholdStep= >> >> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed >> value for the target number of cards, though it is poorly named for that. >> >> A configuration that was useful for some kinds of debugging and testing was to >> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a >> very large value, effectively disabling concurrent refinement. To support >> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic >> option has been added (see JDK-8155996). >> >> The other options are meaningless for the new controller. >> >> Because of these option changes, a CSR and a release note need to accompany >> this change. >> >> Testing: >> mach5 tier1-6 >> various performance tests. >> local (linux-x64) tier1 with -XX:-G1UseConcRefinement >> >> Performance testing found no regressions, but also little or no improvement >> with default options, which was expected. With default options most of our >> performance tests do very little concurrent refinement. And even for those >> that do, while the old controller had a number of problems, the impact of >> those problems is small and hard to measure for most applications. >> >> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare >> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with >> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options >> held constant) showed a statistically significant improvement of about 4.5% >> for critical-jOPS. Using the changed controller, the difference between this >> configuration and the default is fairly small, while the baseline shows >> significant degradation with the more restrictive options. >> >> For all tests and configurations the new controller often creates many fewer >> refinement threads. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - adjust young target length periodically > - use cards in thread-buffers when revising young list target length > - remove remset sampler > - move remset-driven young-gen resizing > - fix type of predict_dirtied_cards_in_threa_buffers > - Merge branch 'master' into crt2 > - comments around alloc_bytes_rate being zero > - tschatzl comments > - changed threads wanted logging per kstefanj > - s/max_cards/mutator_refinement_threshold/ > - ... and 12 more: https://git.openjdk.org/jdk/compare/8487c56f...1631a61a LGTM! src/hotspot/share/gc/g1/g1YoungCollector.cpp line 498: > 496: // Flush dirty card queues to qset, so later phases don't need to account > 497: // for partially filled per-thread queues and such. > 498: flush_dirty_card_queues(); Can also fix the existing issue to call `flush_dirty_card_queues();` a bit earlier if you hadn't already filed it as a separate bug. ------------- Marked as reviewed by iwalulya (Reviewer). PR: https://git.openjdk.org/jdk/pull/10256 From kbarrett at openjdk.org Sun Oct 16 08:43:51 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 16 Oct 2022 08:43:51 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v6] In-Reply-To: References: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> Message-ID: On Sun, 16 Oct 2022 07:26:00 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - adjust young target length periodically >> - use cards in thread-buffers when revising young list target length >> - remove remset sampler >> - move remset-driven young-gen resizing >> - fix type of predict_dirtied_cards_in_threa_buffers >> - Merge branch 'master' into crt2 >> - comments around alloc_bytes_rate being zero >> - tschatzl comments >> - changed threads wanted logging per kstefanj >> - s/max_cards/mutator_refinement_threshold/ >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8487c56f...1631a61a > > src/hotspot/share/gc/g1/g1YoungCollector.cpp line 498: > >> 496: // Flush dirty card queues to qset, so later phases don't need to account >> 497: // for partially filled per-thread queues and such. >> 498: flush_dirty_card_queues(); > > Can also fix the existing issue to call `flush_dirty_card_queues();` a bit earlier if you hadn't already filed it as a separate bug. Already a bug for this - https://bugs.openjdk.org/browse/JDK-8295319. I'm still thinking about / looking at that one. I'm not sure moving it earlier is correct. Might be that record_concurrent_refinement_stats should be combined with concatenate_logs, maybe where concatenate_logs / flush_dirty_card_queues is now. I think it can be addressed separately from the new concurrent refinement controller. ------------- PR: https://git.openjdk.org/jdk/pull/10256 From yadongwang at openjdk.org Sun Oct 16 10:41:06 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Sun, 16 Oct 2022 10:41:06 GMT Subject: Integrated: 8295009: RISC-V: Interpreter intrinsify Thread.currentThread() In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 02:32:28 GMT, Yadong Wang wrote: > Calling intrinsic version of Thread.currentThread() in interpreter is ~30% faster on the Unmatched board: > Before: > Benchmark Mode Cnt Score Error Units > MyBenchmark.testCurrentThread avgt 5 4665.765 ? 212.906 ns/op > After: > Benchmark Mode Cnt Score Error Units > MyBenchmark.testCurrentThread avgt 5 3381.415 ? 223.005 ns/op > > Tier1 and jdk_loom have been tested on unmatched. This pull request has now been integrated. Changeset: d3781ac8 Author: Yadong Wang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/d3781ac8a38943d8a20304e770b01d5418ee33d0 Stats: 15 lines in 4 files changed: 11 ins; 0 del; 4 mod 8295009: RISC-V: Interpreter intrinsify Thread.currentThread() Reviewed-by: fyang, shade ------------- PR: https://git.openjdk.org/jdk/pull/10709 From dholmes at openjdk.org Mon Oct 17 02:16:21 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Oct 2022 02:16:21 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events I am still not happy about this approach. JFR is written in Java and so triggers numerous events that are defined for Java code - that is a simple consequence of implementing it in Java. If there is to be filtering of events that originate from JFR Java code then there should be a general purpose filtering mechanism available to say "exclude JFR code" or "include JFR code" as desired. And to me the filtering mechanism should reside in the event commit code inside the JFR code not in the event-posting code in the shared components of the VM. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From xlinzheng at openjdk.org Mon Oct 17 04:12:38 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Mon, 17 Oct 2022 04:12:38 GMT Subject: RFR: 8295110: RISC-V: Mark out relocations as incompressible [v7] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 10:46:44 GMT, Xiaolin Zheng wrote: >> This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. >> >> Chaining PR #10421. >> >> 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] >> 2. Performance: conservatively no regressions observed. [3] >> >> The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. >> >> >> Having tested several times hotspot tier1~tier4; Testing another turn on board. >> >> >> >> [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html >> [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html >> [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html > > Xiaolin Zheng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge remote-tracking branch 'jdk20/master' into riscv-rvc-checkin-second-half-part > - Merge branch 'master' into riscv-rvc-checkin-second-half-part > - Keep aligning int32_t style > - remove a dummy line, and a simple polish by the way > - swap the order > - Change the style as to comments > - [7] Blacklist mode > - [6] RVC: IncompressibleRegions for relocations Test results (new code for refactoring in this patch) on QEMU & my board in the several days look okay. ------------- PR: https://git.openjdk.org/jdk/pull/10643 From fyang at openjdk.org Mon Oct 17 04:33:08 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Oct 2022 04:33:08 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v3] In-Reply-To: <2abWu-ITUoN-hNBTy6f0qQN-Q5XuAF3XXbTe7Kz63iU=.350a2155-f2ef-4909-98d8-350306413f74@github.com> References: <05W6k3vqT1b5IGhd653G8zPjCbtiN7HFg8KzZsiMorQ=.38f418d5-540e-46af-a72c-9d6b4471428a@github.com> <9KWs3-ICjuSPKWkcn-hTz0V2rMUrn8B6aqmE2spm5es=.cc94175e-a8f9-468a-991a-656ee2c8c581@github.com> <2abWu-ITUoN-hNBTy6f0qQN-Q5XuAF3XXbTe7Kz63iU=.350a2155-f2ef-4909-98d8-350306413f74@github.com> Message-ID: On Fri, 14 Oct 2022 15:39:41 GMT, Roman Kennke wrote: >>> > > > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch >>> > > > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? >>> > > > > >>> > > > > >>> > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) >>> > > > >>> > > > >>> > > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? >>> > > >>> > > >>> > > Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. >>> > >>> > >>> > Yes, the whole PR: https://patch-diff.githubusercontent.com/raw/openjdk/jdk/pull/10590.diff >>> >>> The PR reports a merge conflict in risc-v code, when applied vs latest tip. Have you resolved that? GHA (which includes risc-v) is happy, otherwise. >> >> @rkennke : >> I did see some "Hunk succeeded" messages for the risc-v part when applying the change with: $ patch -p1 < ~/10590.diff >> But I didn't check whether that will cause a problem here. >> >> >> patching file src/hotspot/cpu/riscv/c1_CodeStubs_riscv.cpp >> patching file src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp >> patching file src/hotspot/cpu/riscv/c1_LIRGenerator_riscv.cpp >> patching file src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp >> Hunk #1 succeeded at 58 (offset -1 lines). >> Hunk #2 succeeded at 67 (offset -1 lines). >> patching file src/hotspot/cpu/riscv/c1_Runtime1_riscv.cpp >> patching file src/hotspot/cpu/riscv/interp_masm_riscv.cpp >> patching file src/hotspot/cpu/riscv/macroAssembler_riscv.cpp >> Hunk #1 succeeded at 2499 (offset 324 lines). >> Hunk #2 succeeded at 4474 (offset 330 lines). >> patching file src/hotspot/cpu/riscv/macroAssembler_riscv.hpp >> Hunk #1 succeeded at 869 with fuzz 2 (offset 313 lines). >> Hunk #2 succeeded at 1252 (offset 325 lines). >> patching file src/hotspot/cpu/riscv/riscv.ad >> Hunk #1 succeeded at 2385 (offset 7 lines). >> Hunk #2 succeeded at 2407 (offset 7 lines). >> Hunk #3 succeeded at 2433 (offset 7 lines). >> Hunk #4 succeeded at 10403 (offset 33 lines). >> Hunk #5 succeeded at 10417 (offset 33 lines). >> patching file src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> Hunk #1 succeeded at 975 (offset 21 lines). >> Hunk #2 succeeded at 1030 (offset 21 lines). >> Hunk #3 succeeded at 1042 (offset 21 lines). >> Hunk #4 succeeded at 1058 (offset 21 lines). >> Hunk #5 succeeded at 1316 (offset 24 lines). >> Hunk #6 succeeded at 1416 (offset 24 lines). >> Hunk #7 succeeded at 1492 (offset 24 lines). >> Hunk #8 succeeded at 1517 (offset 24 lines). >> Hunk #9 succeeded at 1621 (offset 24 lines). > >> > > > > > > Here is the basic support for RISC-V: https://cr.openjdk.java.net/~shade/8291555/riscv-patch-1.patch >> > > > > > > -- I adapted this from AArch64 changes, and tested it very lightly. @RealFYang, can I leave the testing and follow up fixes to you? >> > > > > > >> > > > > > >> > > > > > @shipilev : Sure, I am happy to to that! Thanks for porting this to RISC-V :-) >> > > > > >> > > > > >> > > > > @shipilev : After applying this on today's jdk master, linux-riscv64 fastdebug fail to build on HiFive Unmatched. I see JVM crash happens during the build process. I suppose you carried out the test with some release build, right? >> > > > >> > > > >> > > > Have you applied the whole PR? Or only the patch that @shipilev provided. Because only the patch without the rest of the PR is bound to fail. >> > > >> > > >> > > Yes, the whole PR: https://patch-diff.githubusercontent.com/raw/openjdk/jdk/pull/10590.diff >> > >> > >> > The PR reports a merge conflict in risc-v code, when applied vs latest tip. Have you resolved that? GHA (which includes risc-v) is happy, otherwise. >> >> @rkennke : I did see some "Hunk succeeded" messages for the risc-v part when applying the change with: $ patch -p1 < ~/10590.diff But I didn't check whether that will cause a problem here. > > If you take the latest code from this PR, it would already have the patch applied. No need to patch it again. @rkennke : Could you please add this follow-up fix for RISC-V? I can build fastdebug on HiFive Unmatched board with this fix now and run non-trivial benchmark workloads. I will carry out more tests. [riscv-patch-2.txt](https://github.com/openjdk/jdk/files/9796886/riscv-patch-2.txt) ------------- PR: https://git.openjdk.org/jdk/pull/10590 From xlinzheng at openjdk.org Mon Oct 17 06:00:58 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Mon, 17 Oct 2022 06:00:58 GMT Subject: Integrated: 8295110: RISC-V: Mark out relocations as incompressible In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 05:32:51 GMT, Xiaolin Zheng wrote: > This patch marks all relocations incompressible as pre-discussions at [1] and converts instructions to their 2-byte compressible counterparts as much as possible when UseRVC is enabled. > > Chaining PR #10421. > > 1. Code size reduction rate: about ~17% now after this patch under RVC, meaning if there's a piece of code of 1000 bytes, it may shrink to 830 bytes when RVC is enabled. [2] > 2. Performance: conservatively no regressions observed. [3] > > The overloaded `relocate()` methods hide `IncompressibleRegion`s inside, to exclude instructions used at relocations from being compressed. > > > Having tested several times hotspot tier1~tier4; Testing another turn on board. > > > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000615.html > [2] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000633.html > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-October/000656.html This pull request has now been integrated. Changeset: 9005af3b Author: Xiaolin Zheng Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/9005af3b90fbd3607aeb83efe1c4a6ffa5d104f0 Stats: 312 lines in 12 files changed: 148 ins; 3 del; 161 mod 8295110: RISC-V: Mark out relocations as incompressible Reviewed-by: fyang, yadongwang ------------- PR: https://git.openjdk.org/jdk/pull/10643 From shade at openjdk.org Mon Oct 17 07:58:02 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 07:58:02 GMT Subject: RFR: 8294314: Minimize disabled warnings in hotspot [v21] In-Reply-To: <5CNdaZqYt3cnZZBbs9QDN4K7ebprUsUV3W4-D0m32lA=.d46ec09f-97ea-4656-a3fc-78ac5190e76b@github.com> References: <5CNdaZqYt3cnZZBbs9QDN4K7ebprUsUV3W4-D0m32lA=.d46ec09f-97ea-4656-a3fc-78ac5190e76b@github.com> Message-ID: On Sat, 15 Oct 2022 08:11:39 GMT, Kim Barrett wrote: > Looks good. I think you want to do the actual "Approve" to supersede your previous "Request changes" review. Anyway, I think we are good for integration here, @magicus? ------------- PR: https://git.openjdk.org/jdk/pull/10414 From haosun at openjdk.org Mon Oct 17 08:37:03 2022 From: haosun at openjdk.org (Hao Sun) Date: Mon, 17 Oct 2022 08:37:03 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v4] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 04:12:22 GMT, Hao Sun wrote: >> In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. >> >> Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. >> >> >> $ java -XX:+PrintBytecodeHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 5004099 executed bytecodes: >> >> absolute relative code name >> ---------------------------------------------------------------------- >> 319124 6.38% dc fast_aload_0 >> 313397 6.26% e0 fast_iload >> 251436 5.02% b6 invokevirtual >> 227428 4.54% 19 aload >> 166054 3.32% a7 goto >> 159167 3.18% 2b aload_1 >> 151803 3.03% de fast_aaccess_0 >> 136787 2.73% 1b iload_1 >> 124037 2.48% 36 istore >> 118791 2.37% 84 iinc >> 118121 2.36% 1c iload_2 >> 110484 2.21% a2 if_icmpge >> >> $ java -XX:+PrintBytecodePairHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 4804441 executed bytecode pairs: >> >> absolute relative codes 1st bytecode 2nd bytecode >> ---------------------------------------------------------------------- >> 77602 1.615% 84 a7 iinc goto >> 49749 1.035% 36 e0 istore fast_iload >> 48931 1.018% e0 10 fast_iload bipush >> 46294 0.964% e0 b6 fast_iload invokevirtual >> 42661 0.888% a7 e0 goto fast_iload >> 42243 0.879% 3a 19 astore aload >> 40138 0.835% 19 b9 aload invokeinterface >> 36617 0.762% dc 2b fast_aload_0 aload_1 >> 35745 0.744% b7 dc invokespecial fast_aload_0 >> 35384 0.736% 19 b6 aload invokevirtual >> 35035 0.729% b6 de invokevirtual fast_aaccess_0 >> 34667 0.722% dc b6 fast_aload_0 invokevirtual >> >> >> In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. >> >> Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. >> >> Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Remove the atomic operation to "_index" @nick-arm Could you help take a look at this patch when you have spare time? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10642 From tschatzl at openjdk.org Mon Oct 17 08:38:57 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Oct 2022 08:38:57 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v6] In-Reply-To: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> References: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> Message-ID: On Sun, 16 Oct 2022 05:36:03 GMT, Kim Barrett wrote: >> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal >> 8155996: Improve concurrent refinement green zone control >> 8134303: Introduce -XX:-G1UseConcRefinement >> >> Please review this change to the control of concurrent refinement. >> >> This new controller takes a different approach to the problem, addressing a >> number of issues. >> >> The old controller used a multiple of the target number of cards to determine >> the range over which increasing numbers of refinement threads should be >> activated, and finally activating mutator refinement. This has a variety of >> problems. It doesn't account for the processing rate, the rate of new dirty >> cards, or the time available to perform the processing. This often leads to >> unnecessary spikes in the number of running refinement threads. It also tends >> to drive the pending number to the target quickly and keep it there, removing >> the benefit from having pending dirty cards filter out new cards for nearby >> writes. It can't delay and leave excess cards in the queue because it could >> be a long time before another buffer is enqueued. >> >> The old controller was triggered by mutator threads enqueing card buffers, >> when the number of cards in the queue exceeded a threshold near the target. >> This required a complex activation protocol between the mutators and the >> refinement threads. >> >> With the new controller there is a primary refinement thread that periodically >> estimates how many refinement threads need to be running to reach the target >> in time for the next GC, along with whether to also activate mutator >> refinement. If the primary thread stops running because it isn't currently >> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any >> involvement in the activation of refinement threads by mutator threads. >> >> The estimate of how many refinement threads are needed uses a prediction of >> time until the next GC, the number of buffered cards, the predicted rate of >> new dirty cards, and the predicted refinement rate. The number of running >> threads is adjusted based on these periodically performed estimates. >> >> This new approach allows more dirty cards to be left in the queue until late >> in the mutator phase, typically reducing the rate of new dirty cards, which >> reduces the amount of concurrent refinement work needed. >> >> It also smooths out the number of running refinement threads, eliminating the >> unnecessarily large spikes that are common with the old method. One benefit >> is that the number of refinement threads (lazily) allocated is often much >> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem >> described in JDK-8153225.) >> >> This change also provides a new method for calculating for the number of dirty >> cards that should be pending at the start of a GC. While this calculation is >> conceptually distinct from the thread control, the two were significanly >> intertwined in the old controller. Changing this calculation separately and >> first would have changed the behavior of the old controller in ways that might >> have introduced regressions. Changing it after the thread control was changed >> would have made it more difficult to test and measure the thread control in a >> desirable configuration. >> >> The old calculation had various problems that are described in JDK-8155996. >> In particular, it can get more or less stuck at low values, and is slow to >> respond to changes. >> >> The old controller provided a number of product options, none of which were >> very useful for real applications, and none of which are very applicable to >> the new controller. All of these are being obsoleted. >> >> -XX:-G1UseAdaptiveConcRefinement >> -XX:G1ConcRefinementGreenZone= >> -XX:G1ConcRefinementYellowZone= >> -XX:G1ConcRefinementRedZone= >> -XX:G1ConcRefinementThresholdStep= >> >> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed >> value for the target number of cards, though it is poorly named for that. >> >> A configuration that was useful for some kinds of debugging and testing was to >> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a >> very large value, effectively disabling concurrent refinement. To support >> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic >> option has been added (see JDK-8155996). >> >> The other options are meaningless for the new controller. >> >> Because of these option changes, a CSR and a release note need to accompany >> this change. >> >> Testing: >> mach5 tier1-6 >> various performance tests. >> local (linux-x64) tier1 with -XX:-G1UseConcRefinement >> >> Performance testing found no regressions, but also little or no improvement >> with default options, which was expected. With default options most of our >> performance tests do very little concurrent refinement. And even for those >> that do, while the old controller had a number of problems, the impact of >> those problems is small and hard to measure for most applications. >> >> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare >> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with >> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options >> held constant) showed a statistically significant improvement of about 4.5% >> for critical-jOPS. Using the changed controller, the difference between this >> configuration and the default is fairly small, while the baseline shows >> significant degradation with the more restrictive options. >> >> For all tests and configurations the new controller often creates many fewer >> refinement threads. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - adjust young target length periodically > - use cards in thread-buffers when revising young list target length > - remove remset sampler > - move remset-driven young-gen resizing > - fix type of predict_dirtied_cards_in_threa_buffers > - Merge branch 'master' into crt2 > - comments around alloc_bytes_rate being zero > - tschatzl comments > - changed threads wanted logging per kstefanj > - s/max_cards/mutator_refinement_threshold/ > - ... and 12 more: https://git.openjdk.org/jdk/compare/8487c56f...1631a61a Lgtm, see minor comments. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 309: > 307: // the remembered sets (and many other components), so this thread constantly > 308: // reevaluates the prediction for the remembered set scanning costs, and potentially > 309: // G1Policy resizes the young gen. This may do a premature GC or even Suggestion: // resizes the young gen. This may do a premature GC or even I think this flows better with that word removed, feel free to ignore. src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 146: > 144: uint64_t adjust_threads_period_ms() const; > 145: bool is_in_last_adjustment_period() const; > 146: class RemSetSamplingClosure; Suggestion: class RemSetSamplingClosure; Maybe an added newline to emphasize that declaration. src/hotspot/share/gc/g1/g1_globals.hpp line 189: > 187: range(0, max_intx) \ > 188: \ > 189: product(uintx, G1ConcRefinementServiceIntervalMillis, 300, \ Since this is a product option we need to add this flag as obsolete/removed in the CSR request as already suggested there in a comment. I doubt that anybody ever used this one ever. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/10256 From fyang at openjdk.org Mon Oct 17 08:56:02 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Oct 2022 08:56:02 GMT Subject: RFR: 8295396: RISC-V: Cleanup useless CompressibleRegions In-Reply-To: <8IC71PRQP2ha10s_zyrPU3fsqvCCMXSUFXC5gjS_DUw=.165ce3bb-92b6-4d4f-bfcd-4928c09ea352@github.com> References: <8IC71PRQP2ha10s_zyrPU3fsqvCCMXSUFXC5gjS_DUw=.165ce3bb-92b6-4d4f-bfcd-4928c09ea352@github.com> Message-ID: On Mon, 17 Oct 2022 07:57:00 GMT, Xiaolin Zheng wrote: > Cleanup no longer used old code introduced in the riscv-port repo after #10643, in which we have marked out all incompressible places so all other generated instructions could be safely compressed if compressible. Hence the old `CompressibleRegion`s are useless and need a cleanup. > > After this patch there are only two places using `CompressibleRegion`s: `MachNopNode::emit()` and `MacroAssembler::align()`. These `CompressibleRegion`s transforming the nops used for alignment purposes cannot be removed and the nops should be kept as 2-byte when RVC is enabled, for that we are at a 2-byte boundary and want to do `align(4)` with 4-byte nops would be impossible. So under RVC the nops for alignment purposes must be 2-byte ones. Others `CompressibleRegion`s are useless now. > > Has passed hotspot tier1~4 together with other patches; another tier1 is running. Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10722 From ihse at openjdk.org Mon Oct 17 09:20:15 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 17 Oct 2022 09:20:15 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Build changes are approved. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.org/jdk/pull/10287 From ihse at openjdk.org Mon Oct 17 09:34:12 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 17 Oct 2022 09:34:12 GMT Subject: Integrated: 8294314: Minimize disabled warnings in hotspot In-Reply-To: References: Message-ID: On Fri, 23 Sep 2022 20:22:37 GMT, Magnus Ihse Bursie wrote: > After [JDK-8294281](https://bugs.openjdk.org/browse/JDK-8294281), it is now possible to disable warnings for individual files instead for whole libraries. I used this opportunity to go through all disabled warnings in hotspot. > > Any warnings that were only triggered in a few files were removed from hotspot as a whole, and changed to be only disabled for those files. > > Some warnings didn't trigger in any file anymore, and could just be removed. > > Overall, this reduced the number of disabled warnings by roughly half for gcc, clang and visual studio. The remaining warnings are sorted in "frequency", that is, the first listed warnings are triggered in the most number of files, while the last in the fewest number of files. So if anyone were to try to address the remaining warnings, it would make sense to chop of this list from the back. > > I believe the warnings that are disabled on a per-file basis can most likely be fixed relatively easily. > > I have verified this by Oracle's internal CI system, and GitHub Actions. (But I have not yet gotten a fully green run due to instabilities in GHA, however this patch can't reasonably have anything to do with that.) As always, warnings tend to differ a bit between compilers, so if someone wants to take this on a spin with some other version, please go ahead. If I missed some warning, in worst case we'll just have to add it back again, and in the meanwhile `configure --disable-warnings-as-errors` is an okay workaround. > > It also turned out that JDK-8294281 did not save the new per-file warnings in VarDeps, so I had to move $1_WARNINGS_FLAGS from $1_BASE_CFLAGS to $1_CFLAGS (and similar for C++). > > Annoyingly, the assert macro triggers `tautological-constant-out-of-range-compare` on clang, so while this is a single problem in a single file, this erupts all over the place in debug builds. If this can be fixed, the ugly extra `DISABLED_WARNINGS_clang += tautological-constant-out-of-range-compare` for non-release builds can be removed. This pull request has now been integrated. Changeset: 7743345f Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/7743345f6f73398f280fd18364b4cea10a6b0f2f Stats: 65 lines in 5 files changed: 41 ins; 8 del; 16 mod 8294314: Minimize disabled warnings in hotspot Co-authored-by: Aleksey Shipilev Reviewed-by: erikj, shade ------------- PR: https://git.openjdk.org/jdk/pull/10414 From ngasson at openjdk.org Mon Oct 17 09:42:12 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Mon, 17 Oct 2022 09:42:12 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v4] In-Reply-To: References: Message-ID: <2fPXHJ93UZofnJAsmvV1CuKFWZHHCTmGslt2T4GV0bs=.8f3e9cb0-ec47-47e4-9650-7ba973a6b709@github.com> On Thu, 13 Oct 2022 04:12:22 GMT, Hao Sun wrote: >> In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. >> >> Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. >> >> >> $ java -XX:+PrintBytecodeHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 5004099 executed bytecodes: >> >> absolute relative code name >> ---------------------------------------------------------------------- >> 319124 6.38% dc fast_aload_0 >> 313397 6.26% e0 fast_iload >> 251436 5.02% b6 invokevirtual >> 227428 4.54% 19 aload >> 166054 3.32% a7 goto >> 159167 3.18% 2b aload_1 >> 151803 3.03% de fast_aaccess_0 >> 136787 2.73% 1b iload_1 >> 124037 2.48% 36 istore >> 118791 2.37% 84 iinc >> 118121 2.36% 1c iload_2 >> 110484 2.21% a2 if_icmpge >> >> $ java -XX:+PrintBytecodePairHistogram --version | head -20 >> openjdk 20-internal 2023-03-21 >> OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) >> OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) >> >> Histogram of 4804441 executed bytecode pairs: >> >> absolute relative codes 1st bytecode 2nd bytecode >> ---------------------------------------------------------------------- >> 77602 1.615% 84 a7 iinc goto >> 49749 1.035% 36 e0 istore fast_iload >> 48931 1.018% e0 10 fast_iload bipush >> 46294 0.964% e0 b6 fast_iload invokevirtual >> 42661 0.888% a7 e0 goto fast_iload >> 42243 0.879% 3a 19 astore aload >> 40138 0.835% 19 b9 aload invokeinterface >> 36617 0.762% dc 2b fast_aload_0 aload_1 >> 35745 0.744% b7 dc invokespecial fast_aload_0 >> 35384 0.736% 19 b6 aload invokevirtual >> 35035 0.729% b6 de invokevirtual fast_aaccess_0 >> 34667 0.722% dc b6 fast_aload_0 invokevirtual >> >> >> In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. >> >> Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. >> >> Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Remove the atomic operation to "_index" Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10642 From luhenry at openjdk.org Mon Oct 17 09:47:55 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 17 Oct 2022 09:47:55 GMT Subject: RFR: 8294211: Zero: Decode arch-specific error context if possible [v4] In-Reply-To: References: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> Message-ID: On Thu, 13 Oct 2022 19:10:36 GMT, Aleksey Shipilev wrote: >> After POSIX signal refactorings, Zero error handling had "regressed" a bit: Zero always gets `NULL` as `pc` in error handling code, and thus it fails with SEGV at pc=0x0. We can do better by implementing context decoding where possible. >> >> Unfortunately, this introduces some arch-specific code in Zero code. The arch-specific code is copy-pasted (with inline definitions, if needed) from the relevant `os_linux_*.cpp` files. The unimplemented arches would still report the same confusing `hs_err`-s. We can emulate (and thus test) the generic behavior using new diagnostic VM option. >> >> This reverts parts of [JDK-8259392](https://bugs.openjdk.org/browse/JDK-8259392). >> >> Sample test: >> >> >> import java.lang.reflect.*; >> import sun.misc.Unsafe; >> >> public class Crash { >> public static void main(String... args) throws Exception { >> Field f = Unsafe.class.getDeclaredField("theUnsafe"); >> f.setAccessible(true); >> Unsafe u = (Unsafe) f.get(null); >> u.getInt(42); // accesing via broken ptr >> } >> } >> >> >> Linux x86_64 Zero fastdebug crash currently: >> >> >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x0000000000000000, pid=538793, tid=538794 >> # >> ... >> # (no native frame info) >> ... >> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a >> >> >> Linux x86_64 Zero fastdebug crash with this patch: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007fbbbf08b584, pid=520119, tid=520120 >> # >> ... >> # Problematic frame: >> # V [libjvm.so+0xcbe584] Unsafe_GetInt+0xe4 >> .... >> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a >> >> >> Linux x86_64 Zero fastdebug crash with this patch and `-XX:-DecodeErrorContext`: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x0000000000000000, pid=520268, tid=520269 >> # >> ... >> # Problematic frame: >> # C 0x0000000000000000 >> ... >> siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a >> >> >> Additional testing: >> - [x] Linux x86_64 Zero fastdebug eyeballing crash logs >> - [x] Linux x86_64 Zero fastdebug, `tier1` >> - [x] Linux {x86_64, x86_32, aarch64, arm, riscv64, s390x, ppc64le, ppc64be} Zero fastdebug builds > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8294211-zero-error-context > - Merge branch 'master' into JDK-8294211-zero-error-context > - Merge branch 'master' into JDK-8294211-zero-error-context > - Style nits > - Fix Marked as reviewed by luhenry (Author). src/hotspot/share/runtime/arguments.cpp line 3157: > 3155: > 3156: // Enable error context decoding on known platforms > 3157: #if defined(IA32) || defined(AMD64) || defined(ARM) || \ What's the reasoning behind putting it here vs `src/hotspot/cpu/zero/vm_version_zero.cpp`? ------------- PR: https://git.openjdk.org/jdk/pull/10397 From haosun at openjdk.org Mon Oct 17 09:55:57 2022 From: haosun at openjdk.org (Hao Sun) Date: Mon, 17 Oct 2022 09:55:57 GMT Subject: RFR: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options [v4] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 07:37:52 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove the atomic operation to "_index" > > OK, thanks. Thanks for your reviews! @theRealAph and @nick-arm ------------- PR: https://git.openjdk.org/jdk/pull/10642 From rkennke at openjdk.org Mon Oct 17 10:13:13 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 17 Oct 2022 10:13:13 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v6] In-Reply-To: References: Message-ID: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More RISC-V fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10590/files - new: https://git.openjdk.org/jdk/pull/10590/files/8d146b99..57403ad1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=04-05 Stats: 37 lines in 5 files changed: 0 ins; 8 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From shade at openjdk.org Mon Oct 17 10:17:22 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 10:17:22 GMT Subject: RFR: 8294211: Zero: Decode arch-specific error context if possible [v5] In-Reply-To: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> References: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> Message-ID: > After POSIX signal refactorings, Zero error handling had "regressed" a bit: Zero always gets `NULL` as `pc` in error handling code, and thus it fails with SEGV at pc=0x0. We can do better by implementing context decoding where possible. > > Unfortunately, this introduces some arch-specific code in Zero code. The arch-specific code is copy-pasted (with inline definitions, if needed) from the relevant `os_linux_*.cpp` files. The unimplemented arches would still report the same confusing `hs_err`-s. We can emulate (and thus test) the generic behavior using new diagnostic VM option. > > This reverts parts of [JDK-8259392](https://bugs.openjdk.org/browse/JDK-8259392). > > Sample test: > > > import java.lang.reflect.*; > import sun.misc.Unsafe; > > public class Crash { > public static void main(String... args) throws Exception { > Field f = Unsafe.class.getDeclaredField("theUnsafe"); > f.setAccessible(true); > Unsafe u = (Unsafe) f.get(null); > u.getInt(42); // accesing via broken ptr > } > } > > > Linux x86_64 Zero fastdebug crash currently: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000000000000, pid=538793, tid=538794 > # > ... > # (no native frame info) > ... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Linux x86_64 Zero fastdebug crash with this patch: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007fbbbf08b584, pid=520119, tid=520120 > # > ... > # Problematic frame: > # V [libjvm.so+0xcbe584] Unsafe_GetInt+0xe4 > .... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Linux x86_64 Zero fastdebug crash with this patch and `-XX:-DecodeErrorContext`: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000000000000, pid=520268, tid=520269 > # > ... > # Problematic frame: > # C 0x0000000000000000 > ... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Additional testing: > - [x] Linux x86_64 Zero fastdebug eyeballing crash logs > - [x] Linux x86_64 Zero fastdebug, `tier1` > - [x] Linux {x86_64, x86_32, aarch64, arm, riscv64, s390x, ppc64le, ppc64be} Zero fastdebug builds Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Move argument block - Merge branch 'master' into JDK-8294211-zero-error-context - Merge branch 'master' into JDK-8294211-zero-error-context - Merge branch 'master' into JDK-8294211-zero-error-context - Merge branch 'master' into JDK-8294211-zero-error-context - Style nits - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10397/files - new: https://git.openjdk.org/jdk/pull/10397/files/fe524d40..d9684abc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10397&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10397&range=03-04 Stats: 3281 lines in 92 files changed: 1783 ins; 877 del; 621 mod Patch: https://git.openjdk.org/jdk/pull/10397.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10397/head:pull/10397 PR: https://git.openjdk.org/jdk/pull/10397 From shade at openjdk.org Mon Oct 17 10:17:24 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 10:17:24 GMT Subject: RFR: 8294211: Zero: Decode arch-specific error context if possible [v4] In-Reply-To: References: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> Message-ID: On Mon, 17 Oct 2022 09:45:39 GMT, Ludovic Henry wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8294211-zero-error-context >> - Merge branch 'master' into JDK-8294211-zero-error-context >> - Merge branch 'master' into JDK-8294211-zero-error-context >> - Style nits >> - Fix > > src/hotspot/share/runtime/arguments.cpp line 3157: > >> 3155: >> 3156: // Enable error context decoding on known platforms >> 3157: #if defined(IA32) || defined(AMD64) || defined(ARM) || \ > > What's the reasoning behind putting it here vs `src/hotspot/cpu/zero/vm_version_zero.cpp`? No good reason, really. I moved this to `vm_version_zero.cpp`. ------------- PR: https://git.openjdk.org/jdk/pull/10397 From kbarrett at openjdk.org Mon Oct 17 11:50:03 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 17 Oct 2022 11:50:03 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v7] In-Reply-To: References: Message-ID: > 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal > 8155996: Improve concurrent refinement green zone control > 8134303: Introduce -XX:-G1UseConcRefinement > > Please review this change to the control of concurrent refinement. > > This new controller takes a different approach to the problem, addressing a > number of issues. > > The old controller used a multiple of the target number of cards to determine > the range over which increasing numbers of refinement threads should be > activated, and finally activating mutator refinement. This has a variety of > problems. It doesn't account for the processing rate, the rate of new dirty > cards, or the time available to perform the processing. This often leads to > unnecessary spikes in the number of running refinement threads. It also tends > to drive the pending number to the target quickly and keep it there, removing > the benefit from having pending dirty cards filter out new cards for nearby > writes. It can't delay and leave excess cards in the queue because it could > be a long time before another buffer is enqueued. > > The old controller was triggered by mutator threads enqueing card buffers, > when the number of cards in the queue exceeded a threshold near the target. > This required a complex activation protocol between the mutators and the > refinement threads. > > With the new controller there is a primary refinement thread that periodically > estimates how many refinement threads need to be running to reach the target > in time for the next GC, along with whether to also activate mutator > refinement. If the primary thread stops running because it isn't currently > needed, it sleeps for a period and reevaluates on wakeup. This eliminates any > involvement in the activation of refinement threads by mutator threads. > > The estimate of how many refinement threads are needed uses a prediction of > time until the next GC, the number of buffered cards, the predicted rate of > new dirty cards, and the predicted refinement rate. The number of running > threads is adjusted based on these periodically performed estimates. > > This new approach allows more dirty cards to be left in the queue until late > in the mutator phase, typically reducing the rate of new dirty cards, which > reduces the amount of concurrent refinement work needed. > > It also smooths out the number of running refinement threads, eliminating the > unnecessarily large spikes that are common with the old method. One benefit > is that the number of refinement threads (lazily) allocated is often much > lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem > described in JDK-8153225.) > > This change also provides a new method for calculating for the number of dirty > cards that should be pending at the start of a GC. While this calculation is > conceptually distinct from the thread control, the two were significanly > intertwined in the old controller. Changing this calculation separately and > first would have changed the behavior of the old controller in ways that might > have introduced regressions. Changing it after the thread control was changed > would have made it more difficult to test and measure the thread control in a > desirable configuration. > > The old calculation had various problems that are described in JDK-8155996. > In particular, it can get more or less stuck at low values, and is slow to > respond to changes. > > The old controller provided a number of product options, none of which were > very useful for real applications, and none of which are very applicable to > the new controller. All of these are being obsoleted. > > -XX:-G1UseAdaptiveConcRefinement > -XX:G1ConcRefinementGreenZone= > -XX:G1ConcRefinementYellowZone= > -XX:G1ConcRefinementRedZone= > -XX:G1ConcRefinementThresholdStep= > > The new controller *could* use G1ConcRefinementGreenZone to provide a fixed > value for the target number of cards, though it is poorly named for that. > > A configuration that was useful for some kinds of debugging and testing was to > disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a > very large value, effectively disabling concurrent refinement. To support > this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic > option has been added (see JDK-8155996). > > The other options are meaningless for the new controller. > > Because of these option changes, a CSR and a release note need to accompany > this change. > > Testing: > mach5 tier1-6 > various performance tests. > local (linux-x64) tier1 with -XX:-G1UseConcRefinement > > Performance testing found no regressions, but also little or no improvement > with default options, which was expected. With default options most of our > performance tests do very little concurrent refinement. And even for those > that do, while the old controller had a number of problems, the impact of > those problems is small and hard to measure for most applications. > > When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare > better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with > MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options > held constant) showed a statistically significant improvement of about 4.5% > for critical-jOPS. Using the changed controller, the difference between this > configuration and the default is fairly small, while the baseline shows > significant degradation with the more restrictive options. > > For all tests and configurations the new controller often creates many fewer > refinement threads. Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - more copyright updates - tschatzl comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10256/files - new: https://git.openjdk.org/jdk/pull/10256/files/1631a61a..f6662698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=05-06 Stats: 9 lines in 6 files changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10256.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10256/head:pull/10256 PR: https://git.openjdk.org/jdk/pull/10256 From kbarrett at openjdk.org Mon Oct 17 11:50:08 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 17 Oct 2022 11:50:08 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v6] In-Reply-To: References: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> Message-ID: On Mon, 17 Oct 2022 08:28:29 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - adjust young target length periodically >> - use cards in thread-buffers when revising young list target length >> - remove remset sampler >> - move remset-driven young-gen resizing >> - fix type of predict_dirtied_cards_in_threa_buffers >> - Merge branch 'master' into crt2 >> - comments around alloc_bytes_rate being zero >> - tschatzl comments >> - changed threads wanted logging per kstefanj >> - s/max_cards/mutator_refinement_threshold/ >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8487c56f...1631a61a > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 309: > >> 307: // the remembered sets (and many other components), so this thread constantly >> 308: // reevaluates the prediction for the remembered set scanning costs, and potentially >> 309: // G1Policy resizes the young gen. This may do a premature GC or even > > Suggestion: > > // resizes the young gen. This may do a premature GC or even > > I think this flows better with that word removed, feel free to ignore. Done. > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 146: > >> 144: uint64_t adjust_threads_period_ms() const; >> 145: bool is_in_last_adjustment_period() const; >> 146: class RemSetSamplingClosure; > > Suggestion: > > > class RemSetSamplingClosure; > > > Maybe an added newline to emphasize that declaration. Done. ------------- PR: https://git.openjdk.org/jdk/pull/10256 From kbarrett at openjdk.org Mon Oct 17 11:50:08 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 17 Oct 2022 11:50:08 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v7] In-Reply-To: References: <6VbPACPCtCbUs4cHQTBQk_m6PjSowEhvVJ1ijPJumLE=.0ee6521c-35fa-434a-be9d-6f3907a41a28@github.com> Message-ID: On Mon, 17 Oct 2022 08:34:05 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: >> >> - more copyright updates >> - tschatzl comments > > src/hotspot/share/gc/g1/g1_globals.hpp line 189: > >> 187: range(0, max_intx) \ >> 188: \ >> 189: product(uintx, G1ConcRefinementServiceIntervalMillis, 300, \ > > Since this is a product option we need to add this flag as obsolete/removed in the CSR request as already suggested there in a comment. > > I doubt that anybody ever used this one ever. Oops, forgot that. I did remember to move the CSR back to draft, pending updating it for this change. ------------- PR: https://git.openjdk.org/jdk/pull/10256 From mgronlun at openjdk.org Mon Oct 17 12:11:59 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 17 Oct 2022 12:11:59 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events There exist a general exclusion/inclusion mechanism already. But it is an all-or-nothing proposition. This particular case is a thread that we can't exclude because it runs the periodic events, upon being notified. It is the notification mechanism to run the periodic events that trigger this large amount of unnecessary MonitorWait events. Even should we change it to some util.concurrent construct, we are only pushing the problem, because we might be instrumenting them later too. To work with the existing exclusion mechanism, the system would have to introduce an additional thread, which will be excluded, which only handles the notification, and then by some other means triggers another periodic thread (included) to run the periodic events. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From luhenry at openjdk.org Mon Oct 17 12:13:18 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 17 Oct 2022 12:13:18 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v2] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Fix comment - Fix alignement - Merge branch 'master' of github.com:openjdk/jdk into dev/ludovic/upstream-zicboz - 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. [1] https://github.com/riscv/riscv-CMOs [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/5daadc29..845d0e3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=00-01 Stats: 3715 lines in 108 files changed: 2137 ins; 902 del; 676 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From stuefe at openjdk.org Mon Oct 17 12:45:30 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Oct 2022 12:45:30 GMT Subject: RFR: JDK-8293711: Factor out size parsing functions from arguments.cpp [v2] In-Reply-To: References: <3AIWW1prhISoIn6Bnw6f_JkcSQqBJeMQMwisfOnI8vU=.ff7dae90-7da6-471e-9fdf-49ac3f6429ee@github.com> <7JN3D-ftzKvAvCRgNACnwQMEAPIOk7NnJl_BWbdnGB0=.00393e3c-6613-4605-b1a9-279d0c09ca6e@github.com> <2txQ-RFzFybqprusuQbnZWexNi9GNV0cZ4Y1FC4PM4I=.2922afc3-6bad-493e-8342-8e2872d760ea@github.com> Message-ID: <6IpOb6efGfQQK3zpfsLfuZKN6iAPDlmfRJ_NpyxL7z8=.98de0c94-d6a6-4eb1-b0a9-24571807efc5@github.com> On Mon, 26 Sep 2022 07:17:29 GMT, Stefan Karlsson wrote: >>> > > is the new version acceptable? >>> > >>> > >>> > I would have preferred if the parsing code were not placed in a .hpp file, and would have placed it in a .inline.hpp file, to comply with our guidelines. Other than that, I'm OK with this patch. >>> >>> Almost missed your comment :-) >>> >>> I'll split it up as you suggested, then push. >> >> Wait, since the whole thing is all parsing, would that not mean we have just a parseInteger.inline.hpp, without a parseInteger.hpp? Since I'm unsure what would be left to put into parseInteger.hpp. And a parseInteger.inline.hpp without a parseInteger.hpp makes not much sense, or? >> >> We have some other one-trick-pony-headers that are not .inline and contain their whole functionality: powerOfTwo.hpp, population_count.hpp (probably should rename that), pair.hpp... > >> > > > is the new version acceptable? >> > > >> > > >> > > I would have preferred if the parsing code were not placed in a .hpp file, and would have placed it in a .inline.hpp file, to comply with our guidelines. Other than that, I'm OK with this patch. >> > >> > >> > Almost missed your comment :-) >> > I'll split it up as you suggested, then push. >> >> Wait, since the whole thing is all parsing, would that not mean we have just a parseInteger.inline.hpp, without a parseInteger.hpp? Since I'm unsure what would be left to put into parseInteger.hpp. And a parseInteger.inline.hpp without a parseInteger.hpp makes not much sense, or? >> >> We have some other one-trick-pony-headers that are not .inline and contain their whole functionality: powerOfTwo.hpp, population_count.hpp (probably should rename that), pair.hpp... > > Right. There's a pragmatic side to the rules w.r.t .hpp vs .inline.hpp files. If the file doesn't have dependencies to "non-trivial" header files, then we typically let it slide and put the code in the .hpp, as you have seen in the files you list. Another notable example is atomic.inline.hpp, which was renamed to atomic.hpp. I think we did that, since were too many header files that used Atomic, and it would have been a lot of work to clean that up. > > The questions for me now are: > 1) Is this header small enough that it doesn't pull in large amounts of other headers? > 2) Is it likely to be changed in the future to include "non-trivial" headers? > 3) Is it likely to be used in other .hpp files? > > I think this it is small enough (1), will only be used in .cpp/.inline.hpp files (3), but I'm unsure about (2). > > I'll leave it to your judgement to decide if this is fine to leave in the .hpp file. Hi @stefank, sorry, I was gone for vacation and did not want to rush this before leaving. > > > > > is the new version acceptable? > > > > > > > > > > > > I would have preferred if the parsing code were not placed in a .hpp file, and would have placed it in a .inline.hpp file, to comply with our guidelines. Other than that, I'm OK with this patch. > > > > > > > > > Almost missed your comment :-) > > > I'll split it up as you suggested, then push. > > > > > > Wait, since the whole thing is all parsing, would that not mean we have just a parseInteger.inline.hpp, without a parseInteger.hpp? Since I'm unsure what would be left to put into parseInteger.hpp. And a parseInteger.inline.hpp without a parseInteger.hpp makes not much sense, or? > > We have some other one-trick-pony-headers that are not .inline and contain their whole functionality: powerOfTwo.hpp, population_count.hpp (probably should rename that), pair.hpp... > > Right. There's a pragmatic side to the rules w.r.t .hpp vs .inline.hpp files. If the file doesn't have dependencies to "non-trivial" header files, then we typically let it slide and put the code in the .hpp, as you have seen in the files you list. Another notable example is atomic.inline.hpp, which was renamed to atomic.hpp. I think we did that, since were too many header files that used Atomic, and it would have been a lot of work to clean that up. > > The questions for me now are: > > 1. Is this header small enough that it doesn't pull in large amounts of other headers? > 2. Is it likely to be changed in the future to include "non-trivial" headers? > 3. Is it likely to be used in other .hpp files? > > I think this it is small enough (1), will only be used in .cpp/.inline.hpp files (3), but I'm unsure about (2). > > I'll leave it to your judgement to decide if this is fine to leave in the .hpp file. I don't think this file will see much changes in the future. So I think we can leave the implementations in the hpp file. Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/10252 From stuefe at openjdk.org Mon Oct 17 12:48:01 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 17 Oct 2022 12:48:01 GMT Subject: Integrated: JDK-8293711: Factor out size parsing functions from arguments.cpp In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 15:19:34 GMT, Thomas Stuefe wrote: > Arguments.cpp has several size parsing functions which would be useful in other areas of the hotspot, e.g. in NMT. > > It would be nice to have them factored out into utilities, to reuse the code and to unify memory size handling. Gtests would be good too. > > To simplify reviews, I split the patch into two commits. > > The first commit (https://github.com/openjdk/jdk/pull/10252/commits/700e77e8d1469a2fc3d6611072c4b07aa34ab8e6) contains the unchanged code move without functional changes. > > The second commit (https://github.com/openjdk/jdk/pull/10252/commits/76b4f6f30cc316fd966da60ff6601f54eeb394bf) contains the functional changes I did, as well as the new gtest. This pull request has now been integrated. Changeset: ec2981b8 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ec2981b83bc3ef6977b5f16d5222eb49b0ea49ad Stats: 449 lines in 6 files changed: 339 ins; 110 del; 0 mod 8293711: Factor out size parsing functions from arguments.cpp Reviewed-by: dholmes, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/10252 From shade at openjdk.org Mon Oct 17 13:59:30 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 13:59:30 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v3] In-Reply-To: References: Message-ID: > After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into JDK-8294591-warning-cast-function-type-templatetable - Also disable warnings in gtests - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/10493/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10493&range=02 Stats: 41 lines in 10 files changed: 6 ins; 1 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/10493.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10493/head:pull/10493 PR: https://git.openjdk.org/jdk/pull/10493 From egahlin at openjdk.org Mon Oct 17 14:12:18 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Mon, 17 Oct 2022 14:12:18 GMT Subject: RFR: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing Message-ID: Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. Testing: tier1-3 + test/jdk/jdk/jfr Thanks Erik ------------- Commit messages: - Fix disabled - Fix pointer format - Initial Changes: https://git.openjdk.org/jdk/pull/10723/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10723&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8280131 Stats: 306 lines in 6 files changed: 232 ins; 73 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10723.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10723/head:pull/10723 PR: https://git.openjdk.org/jdk/pull/10723 From haosun at openjdk.org Mon Oct 17 14:49:56 2022 From: haosun at openjdk.org (Hao Sun) Date: Mon, 17 Oct 2022 14:49:56 GMT Subject: Integrated: 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 01:27:23 GMT, Hao Sun wrote: > In this patch, we implement functions histogram_bytecode() and histogram_bytecode_pair() for interpreter AArch64 part. Similar to count_bytecode(), we use atomic operations to update the counters as well. > > Here shows part of the message produced with -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options after this patch. > > > $ java -XX:+PrintBytecodeHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 5004099 executed bytecodes: > > absolute relative code name > ---------------------------------------------------------------------- > 319124 6.38% dc fast_aload_0 > 313397 6.26% e0 fast_iload > 251436 5.02% b6 invokevirtual > 227428 4.54% 19 aload > 166054 3.32% a7 goto > 159167 3.18% 2b aload_1 > 151803 3.03% de fast_aaccess_0 > 136787 2.73% 1b iload_1 > 124037 2.48% 36 istore > 118791 2.37% 84 iinc > 118121 2.36% 1c iload_2 > 110484 2.21% a2 if_icmpge > > $ java -XX:+PrintBytecodePairHistogram --version | head -20 > openjdk 20-internal 2023-03-21 > OpenJDK Runtime Environment (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev) > OpenJDK 64-Bit Server VM (fastdebug build 20-internal-adhoc.haosun.jdk-src-dev, mixed mode) > > Histogram of 4804441 executed bytecode pairs: > > absolute relative codes 1st bytecode 2nd bytecode > ---------------------------------------------------------------------- > 77602 1.615% 84 a7 iinc goto > 49749 1.035% 36 e0 istore fast_iload > 48931 1.018% e0 10 fast_iload bipush > 46294 0.964% e0 b6 fast_iload invokevirtual > 42661 0.888% a7 e0 goto fast_iload > 42243 0.879% 3a 19 astore aload > 40138 0.835% 19 b9 aload invokeinterface > 36617 0.762% dc 2b fast_aload_0 aload_1 > 35745 0.744% b7 dc invokespecial fast_aload_0 > 35384 0.736% 19 b6 aload invokevirtual > 35035 0.729% b6 de invokevirtual fast_aaccess_0 > 34667 0.722% dc b6 fast_aload_0 invokevirtual > > > In order to verfiy the correctness, I took the trace information produced by -XX:+TraceBytecodes as a cross reference. The hit times for some bytecodes/bytecode pairs can be obtained via parsing the trace. Then I compared the hit times with the corresponding "absolute" columns. I randomly selected several bytecodes/bytecode pairs, and the manual comparion results showed that "absolute" columns are correct. > > Note-1: count_bytecode() is updated. 1) caller-saved registers are used as temporary registers and there is no need to save/restore them. 2) atomic_addw() should be used since the counter is of int type. > > Note-2: As shown by the update in file templateInterpreterGenerator.cpp, function histogram_bytecode() should be invoked only inside !PRODUCT scope. This pull request has now been integrated. Changeset: ae60599e Author: Hao Sun Committer: Nick Gasson URL: https://git.openjdk.org/jdk/commit/ae60599e2ba75d80c3b4279903137b2c549f8066 Stats: 61 lines in 4 files changed: 23 ins; 33 del; 5 mod 8295023: Interpreter(AArch64): Implement -XX:+PrintBytecodeHistogram and -XX:+PrintBytecodePairHistogram options Reviewed-by: aph, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/10642 From ayang at openjdk.org Mon Oct 17 14:54:10 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Oct 2022 14:54:10 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v7] In-Reply-To: References: Message-ID: On Mon, 17 Oct 2022 11:50:03 GMT, Kim Barrett wrote: >> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal >> 8155996: Improve concurrent refinement green zone control >> 8134303: Introduce -XX:-G1UseConcRefinement >> >> Please review this change to the control of concurrent refinement. >> >> This new controller takes a different approach to the problem, addressing a >> number of issues. >> >> The old controller used a multiple of the target number of cards to determine >> the range over which increasing numbers of refinement threads should be >> activated, and finally activating mutator refinement. This has a variety of >> problems. It doesn't account for the processing rate, the rate of new dirty >> cards, or the time available to perform the processing. This often leads to >> unnecessary spikes in the number of running refinement threads. It also tends >> to drive the pending number to the target quickly and keep it there, removing >> the benefit from having pending dirty cards filter out new cards for nearby >> writes. It can't delay and leave excess cards in the queue because it could >> be a long time before another buffer is enqueued. >> >> The old controller was triggered by mutator threads enqueing card buffers, >> when the number of cards in the queue exceeded a threshold near the target. >> This required a complex activation protocol between the mutators and the >> refinement threads. >> >> With the new controller there is a primary refinement thread that periodically >> estimates how many refinement threads need to be running to reach the target >> in time for the next GC, along with whether to also activate mutator >> refinement. If the primary thread stops running because it isn't currently >> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any >> involvement in the activation of refinement threads by mutator threads. >> >> The estimate of how many refinement threads are needed uses a prediction of >> time until the next GC, the number of buffered cards, the predicted rate of >> new dirty cards, and the predicted refinement rate. The number of running >> threads is adjusted based on these periodically performed estimates. >> >> This new approach allows more dirty cards to be left in the queue until late >> in the mutator phase, typically reducing the rate of new dirty cards, which >> reduces the amount of concurrent refinement work needed. >> >> It also smooths out the number of running refinement threads, eliminating the >> unnecessarily large spikes that are common with the old method. One benefit >> is that the number of refinement threads (lazily) allocated is often much >> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem >> described in JDK-8153225.) >> >> This change also provides a new method for calculating for the number of dirty >> cards that should be pending at the start of a GC. While this calculation is >> conceptually distinct from the thread control, the two were significanly >> intertwined in the old controller. Changing this calculation separately and >> first would have changed the behavior of the old controller in ways that might >> have introduced regressions. Changing it after the thread control was changed >> would have made it more difficult to test and measure the thread control in a >> desirable configuration. >> >> The old calculation had various problems that are described in JDK-8155996. >> In particular, it can get more or less stuck at low values, and is slow to >> respond to changes. >> >> The old controller provided a number of product options, none of which were >> very useful for real applications, and none of which are very applicable to >> the new controller. All of these are being obsoleted. >> >> -XX:-G1UseAdaptiveConcRefinement >> -XX:G1ConcRefinementGreenZone= >> -XX:G1ConcRefinementYellowZone= >> -XX:G1ConcRefinementRedZone= >> -XX:G1ConcRefinementThresholdStep= >> >> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed >> value for the target number of cards, though it is poorly named for that. >> >> A configuration that was useful for some kinds of debugging and testing was to >> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a >> very large value, effectively disabling concurrent refinement. To support >> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic >> option has been added (see JDK-8155996). >> >> The other options are meaningless for the new controller. >> >> Because of these option changes, a CSR and a release note need to accompany >> this change. >> >> Testing: >> mach5 tier1-6 >> various performance tests. >> local (linux-x64) tier1 with -XX:-G1UseConcRefinement >> >> Performance testing found no regressions, but also little or no improvement >> with default options, which was expected. With default options most of our >> performance tests do very little concurrent refinement. And even for those >> that do, while the old controller had a number of problems, the impact of >> those problems is small and hard to measure for most applications. >> >> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare >> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with >> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options >> held constant) showed a statistically significant improvement of about 4.5% >> for critical-jOPS. Using the changed controller, the difference between this >> configuration and the default is fairly small, while the baseline shows >> significant degradation with the more restrictive options. >> >> For all tests and configurations the new controller often creates many fewer >> refinement threads. > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - more copyright updates > - tschatzl comments I think `predict_dirtied_cards_in_thread_buffers` should not use `avg + stddev` as prediction since it doesn't follow normal distribution, but if the measured results are good enough, this is fine as well. ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10256 From dnsimon at openjdk.org Mon Oct 17 17:05:55 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 17 Oct 2022 17:05:55 GMT Subject: RFR: 8284614: on macOS "spindump" should be run from failure_handler as root In-Reply-To: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> References: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> Message-ID: On Mon, 17 Oct 2022 16:35:16 GMT, Leonid Mesnik wrote: > The fix is contributed by @plummercj actually. Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10730 From shade at openjdk.org Mon Oct 17 18:08:21 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 18:08:21 GMT Subject: RFR: 8294468: Fix char-subscripts warnings in Hotspot [v2] In-Reply-To: References: Message-ID: > There seem to be the only place in Hotspot where this warning fires, yet the warning is disabled wholesale for Hotspot. This is not good. > > I can trace the addition of char-subscripts exclusion to [JDK-8211029](https://bugs.openjdk.org/browse/JDK-8211029) (Sep 2018). The only place in Hotspot where in fires is present from the initial load (2007). > > The underlying problem that this warning tells us about is that `char` might be signed on some platforms, so we can potentially access the negative index. It is not a bug in our current code, that bounds the value of `k` under `MAXID-1`, which is `19`. > > Additional testing: > - [ ] Linux x86_64 fastdebug `tier1` > - [x] The build matrix of: > - GCC 10 > - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} > - {server} > - {release, fastdebug} Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into JDK-8294468-warning-char-subscripts - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/10455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10455&range=01 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10455.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10455/head:pull/10455 PR: https://git.openjdk.org/jdk/pull/10455 From shade at openjdk.org Mon Oct 17 18:08:56 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 18:08:56 GMT Subject: RFR: 8294591: Fix cast-function-type warning in TemplateTable [v4] In-Reply-To: References: Message-ID: <2jHTulm9tMRZEq_tFq7UMUJjWwIS7TIoOkfxjVsseG8=.e69ac9a2-cf39-4972-818a-77cdf0ed93d6@github.com> > After [JDK-8294314](https://bugs.openjdk.org/browse/JDK-8294314), we would have `templateTable.cpp` excluded with cast-function-type warning. The underlying cause for it is casting functions for `ldc` bytecodes, which take `bool`-typed handlers: Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix build failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10493/files - new: https://git.openjdk.org/jdk/pull/10493/files/5aa7e6b4..def34465 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10493&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10493&range=02-03 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10493.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10493/head:pull/10493 PR: https://git.openjdk.org/jdk/pull/10493 From shade at openjdk.org Mon Oct 17 18:27:44 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 18:27:44 GMT Subject: RFR: 8294438: Fix misleading-indentation warnings in hotspot [v2] In-Reply-To: References: Message-ID: > There are number of places where misleading-indentation is reported by GCC. Currently, the warning is disabled for the entirety of Hotspot, which is not good. > > C1 does an unusual style here. Changing it globally would touch a lot of lines. Instead of doing that, I fit the existing style while also resolving the warnings. Note this actually solves a bug in `lir_alloc_array`, where `do_temp` are called without a check. > > Build-tested this with product of: > - GCC 10 > - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} > - {server, zero} > - {release, fastdebug} > > Linux x86_64 fastdebug `tier1` is fine. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into JDK-8294438-misleading-indentation - Merge branch 'master' into JDK-8294438-misleading-indentation - Also javaClasses.cpp - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/10444/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10444&range=01 Stats: 56 lines in 5 files changed: 7 ins; 20 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/10444.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10444/head:pull/10444 PR: https://git.openjdk.org/jdk/pull/10444 From vlivanov at openjdk.org Mon Oct 17 19:03:53 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 17 Oct 2022 19:03:53 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic FTR I did an exercise in source code archeology and here are my findings. The origin of `AlwaysRestoreFPU`-related code (both in x86-32 and arm-specific code) can be traced back to [JDK-6487931](https://bugs.openjdk.org/browse/JDK-6487931) and [JDK-6550813](https://bugs.openjdk.org/browse/JDK-6550813). Though both issues manifested as JVM crashes, the underlying problem was identified as FPU control word corruption by native code. The regression test does trigger the corruption from a JNI call (using either `_FPU_SETCW` [1] or `_controlfp` [2]), but it was deliberately limited to x86-32. Based on that, I conclude that the problem with FP environment corruption by native code was known before, but an opt-in solution was chosen. (Frankly speaking, I don't know why an opt-in solution was considered sufficient. Maybe because it was erroneously believed it can only lead to a crash at runtime?) Considering we are now aware about insidious nature of the problem (silent result corruption), I'm inclined to propose either to turn on `AlwaysRestoreFPU` by default (and provide implementation on platforms where it is missed) or, at least, catch FPU control word corruption on native->java transitions and crash the JVM advertising `-XX:+AlwaysRestoreFPU` as a solution. [1] https://man7.org/linux/man-pages/man3/__setfpucw.3.html [2] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2?view=msvc-170 ------------- PR: https://git.openjdk.org/jdk/pull/10661 From vkempik at openjdk.org Mon Oct 17 19:18:59 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 17 Oct 2022 19:18:59 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v2] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Mon, 17 Oct 2022 12:13:18 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix comment > - Fix alignement > - Merge branch 'master' of github.com:openjdk/jdk into dev/ludovic/upstream-zicboz > - 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V > > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 117: > 115: } > 116: > 117: _cache_line_size = 64; Cache line size can be different on various cpus. Hardcoding it isn't nice. if it can't be reliably retrieved at runtime, maybe allow specifying it via some XX option ? ------------- PR: https://git.openjdk.org/jdk/pull/10718 From kbarrett at openjdk.org Mon Oct 17 19:45:06 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 17 Oct 2022 19:45:06 GMT Subject: RFR: 8294468: Fix char-subscripts warnings in Hotspot [v2] In-Reply-To: References: Message-ID: On Mon, 17 Oct 2022 18:08:21 GMT, Aleksey Shipilev wrote: >> There seem to be the only place in Hotspot where this warning fires, yet the warning is disabled wholesale for Hotspot. This is not good. >> >> I can trace the addition of char-subscripts exclusion to [JDK-8211029](https://bugs.openjdk.org/browse/JDK-8211029) (Sep 2018). The only place in Hotspot where in fires is present from the initial load (2007). >> >> The underlying problem that this warning tells us about is that `char` might be signed on some platforms, so we can potentially access the negative index. It is not a bug in our current code, that bounds the value of `k` under `MAXID-1`, which is `19`. >> >> Additional testing: >> - [ ] Linux x86_64 fastdebug `tier1` >> - [x] The build matrix of: >> - GCC 10 >> - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} >> - {server} >> - {release, fastdebug} > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8294468-warning-char-subscripts > - Fix Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/10455 From iklam at openjdk.org Mon Oct 17 20:13:10 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 17 Oct 2022 20:13:10 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 04:33:55 GMT, Ioi Lam wrote: > Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: > > - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. > - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. > > By doing the resolution at dump time, we can speed up run time start-up by a little bit. > > The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) Ping - any reviewers? ------------- PR: https://git.openjdk.org/jdk/pull/10330 From dholmes at openjdk.org Mon Oct 17 22:06:08 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 17 Oct 2022 22:06:08 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic None of this seems adequate. If you guard the dlopen this is a race as the thread doing the dlopen could get switched out leading to corruption in other threads. But we can't just simply save/restore across JNI calls either as we may save the bad env if the dlopen just occurred! The pre-existing "fixes" in this area seem similarly inadequate (and my recollection of issues was that this was an opt-in band-aid while people migrated away from using problematic native libraries, or else got them fixed). ------------- PR: https://git.openjdk.org/jdk/pull/10661 From duke at openjdk.org Mon Oct 17 22:45:02 2022 From: duke at openjdk.org (Zixian Cai) Date: Mon, 17 Oct 2022 22:45:02 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage Message-ID: The second register argument should be named `count`. Please see the following usage. https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1138 https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1210 https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1584 Update: also fix for AArch64. ------------- Commit messages: - AArch64: Make the arraycopy_epilogue signature consistent with its usage - RISC-V: Make the arraycopy_epilogue signature consistent with its usage Changes: https://git.openjdk.org/jdk/pull/10620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10620&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295016 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10620/head:pull/10620 PR: https://git.openjdk.org/jdk/pull/10620 From shade at openjdk.org Mon Oct 17 22:45:02 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Oct 2022 22:45:02 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Sat, 8 Oct 2022 06:06:25 GMT, Zixian Cai wrote: > The second register argument should be named `count`. Please see the following usage. > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1138 > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1210 > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1584 > > Update: also fix for AArch64. This looks good to me. Bots would allow you to integrate once OCA clears. Same problem exists in aarch64, please fix it there too? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/10620 From duke at openjdk.org Mon Oct 17 22:45:03 2022 From: duke at openjdk.org (Zixian Cai) Date: Mon, 17 Oct 2022 22:45:03 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Sat, 8 Oct 2022 06:06:25 GMT, Zixian Cai wrote: > The second register argument should be named `count`. Please see the following usage. > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1138 > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1210 > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1584 > > Update: also fix for AArch64. @shipilev @RealFYang It seems that you have to be an OpenJDK author to have a Jira account to open issues. Is it possible that someone with access could open an issue on my behalf? Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10620 From fyang at openjdk.org Mon Oct 17 22:45:03 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Oct 2022 22:45:03 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Sun, 9 Oct 2022 03:06:03 GMT, Zixian Cai wrote: > @shipilev @RealFYang It seems that you have to be an OpenJDK author to have a Jira account to open issues. Is it possible that someone with access could open an issue on my behalf? Thanks! Hi, I have created one for you: https://bugs.openjdk.org/browse/JDK-8295016 ------------- PR: https://git.openjdk.org/jdk/pull/10620 From duke at openjdk.org Mon Oct 17 22:45:05 2022 From: duke at openjdk.org (Zixian Cai) Date: Mon, 17 Oct 2022 22:45:05 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Mon, 10 Oct 2022 12:21:38 GMT, Aleksey Shipilev wrote: >> The second register argument should be named `count`. Please see the following usage. >> >> https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1138 >> >> https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1210 >> >> https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1584 >> >> Update: also fix for AArch64. > > Same problem exists in aarch64, please fix it there too? @shipilev fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10620 From fyang at openjdk.org Mon Oct 17 22:45:05 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Oct 2022 22:45:05 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 04:06:00 GMT, Zixian Cai wrote: >> Same problem exists in aarch64, please fix it there too? > > @shipilev fixed. @caizixian : You should modify the title of this PR making it consistent with the title of the JBS issue. ------------- PR: https://git.openjdk.org/jdk/pull/10620 From duke at openjdk.org Mon Oct 17 22:45:06 2022 From: duke at openjdk.org (Zixian Cai) Date: Mon, 17 Oct 2022 22:45:06 GMT Subject: RFR: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 04:06:00 GMT, Zixian Cai wrote: >> Same problem exists in aarch64, please fix it there too? > > @shipilev fixed. > @caizixian : You should modify the title of this PR making it consistent with the title of the JBS issue. Done ------------- PR: https://git.openjdk.org/jdk/pull/10620 From sspitsyn at openjdk.org Mon Oct 17 23:39:15 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 17 Oct 2022 23:39:15 GMT Subject: Integrated: 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 22:49:20 GMT, Serguei Spitsyn wrote: > The spec of JVM TI GetLocalXXX/SetLocalXXX functions is updated to require the target thread to be suspended. If not suspended then the JVMTI_ERROR_THREAD_NOT_SUSPENDED error code is returned by the implementation. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8294690 > > A few tests are impacted by this fix: > > test/hotspot/jtreg/serviceability/jvmti/vthread/GetSetLocalTest > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadTest > test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/capability/CM01/cm01t011 > > > The following test has been removed as non-relevant any more: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java` > > New negative test has been added instead: > ` test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetSetLocalUnsuspended.java` > > All JVM TI and JPDA tests were used locally for verification. > They were also run in Loom repository with `JTREG_MAIN_WRAPPER=Virtual`. > > Mach5 test runs on all platforms are TBD. This pull request has now been integrated. Changeset: 21a825e0 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/21a825e059170e3a069b9f0982737c5839e6dae2 Stats: 1202 lines in 12 files changed: 457 ins; 694 del; 51 mod 8288387: GetLocalXXX/SetLocalXXX spec should require suspending target thread Reviewed-by: lmesnik, dsamersoff ------------- PR: https://git.openjdk.org/jdk/pull/10586 From coleenp at openjdk.org Tue Oct 18 00:43:52 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Oct 2022 00:43:52 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 04:33:55 GMT, Ioi Lam wrote: > Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: > > - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. > - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. > > By doing the resolution at dump time, we can speed up run time start-up by a little bit. > > The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) I have questions. src/hotspot/share/cds/classPrelinker.cpp line 48: > 46: InstanceKlass* super = ik->java_super(); > 47: if (super != NULL) { > 48: add_one_vm_class(super); Do you care about order? Would it be better to add the superclasses before the instanceKlass? src/hotspot/share/cds/classPrelinker.cpp line 59: > 57: ClassPrelinker::ClassPrelinker() { > 58: assert(_singleton == NULL, "must be"); > 59: _singleton = this; I'm not sure why you have this? src/hotspot/share/cds/classPrelinker.cpp line 116: > 114: Klass* ClassPrelinker::get_resolved_klass_or_null(ConstantPool* cp, int cp_index) { > 115: if (cp->tag_at(cp_index).is_klass()) { > 116: CPKlassSlot kslot = cp->klass_slot_at(cp_index); This is another place that CPKlassSlot leaks out. It would be nice if these were in constantPool. Maybe this function belongs in the constant pool. src/hotspot/share/cds/classPrelinker.cpp line 127: > 125: } > 126: > 127: bool ClassPrelinker::can_archive_resolved_klass(ConstantPool* cp, int cp_index) { Can the other can_archive_resolved_klass be just above this one so we know why their names are the same? Also can_archive_vm_resolved_klass is too closely named. src/hotspot/share/cds/classPrelinker.cpp line 146: > 144: > 145: bool first_time; > 146: _processed_classes.put_if_absent(ik, &first_time); I don't see any get functions for this hashtable? src/hotspot/share/cds/classPrelinker.cpp line 194: > 192: CPKlassSlot kslot = cp->klass_slot_at(cp_index); > 193: int name_index = kslot.name_index(); > 194: Symbol* name = cp->symbol_at(name_index); Isn't this klass_name_at() ? I think these details should stay in constantPool.cpp. src/hotspot/share/cds/classPrelinker.cpp line 205: > 203: > 204: #if INCLUDE_CDS_JAVA_HEAP > 205: void ClassPrelinker::resolve_string(constantPoolHandle cp, int cp_index, TRAPS) { Is this function only needed in this cpp file? Can you make it static and defined before its caller? Are there other functions where this is true? Does it need to be a member function of ClassPrelinker? ------------- PR: https://git.openjdk.org/jdk/pull/10330 From dholmes at openjdk.org Tue Oct 18 00:53:07 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Oct 2022 00:53:07 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: <1XyiaRUBadonxQ_XjryHn5duKVmXy931G9QleAH0iDA=.da2ccc52-50ab-491e-bc1e-7e5c133f6fcf@github.com> On Mon, 17 Oct 2022 12:08:01 GMT, Markus Gr?nlund wrote: >> Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements >> - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements >> - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events >> - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events >> - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events >> - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events >> - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > > There exist a general exclusion/inclusion mechanism already. But it is an all-or-nothing proposition. This particular case is a thread that we can't exclude because it runs the periodic events, upon being notified. It is the notification mechanism to run the periodic events that trigger this large amount of unnecessary MonitorWait events. Even should we change it to some util.concurrent construct, we are only pushing the problem, because we might be instrumenting them later too. To work with the existing exclusion mechanism, the system would have to introduce an additional thread, which will be excluded, which only handles the notification, and then by some other means triggers another periodic thread (included) to run the periodic events. @mgronlun my request is that this filtering be done inside the commit logic by the JFR code, not at the site where the event is generated - ie this internal-jfr-event filtering is internalized into the JFR code. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From duke at openjdk.org Tue Oct 18 00:56:12 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 00:56:12 GMT Subject: Integrated: 8295016: Make the arraycopy_epilogue signature consistent with its usage In-Reply-To: References: Message-ID: On Sat, 8 Oct 2022 06:06:25 GMT, Zixian Cai wrote: > The second register argument should be named `count`. Please see the following usage. > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1138 > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1210 > > https://github.com/openjdk/jdk/blob/495c043533d68106e07721b2e971006e9eba97e3/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L1584 > > Update: also fix for AArch64. This pull request has now been integrated. Changeset: 692cdab2 Author: Zixian Cai Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/692cdab2be7dfc6e12b127f8e2c97bc41536cb84 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8295016: Make the arraycopy_epilogue signature consistent with its usage Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/10620 From xlinzheng at openjdk.org Tue Oct 18 01:09:16 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 18 Oct 2022 01:09:16 GMT Subject: RFR: 8295396: RISC-V: Cleanup useless CompressibleRegions In-Reply-To: <8IC71PRQP2ha10s_zyrPU3fsqvCCMXSUFXC5gjS_DUw=.165ce3bb-92b6-4d4f-bfcd-4928c09ea352@github.com> References: <8IC71PRQP2ha10s_zyrPU3fsqvCCMXSUFXC5gjS_DUw=.165ce3bb-92b6-4d4f-bfcd-4928c09ea352@github.com> Message-ID: On Mon, 17 Oct 2022 07:57:00 GMT, Xiaolin Zheng wrote: > Cleanup no longer used old code introduced in the riscv-port repo after #10643, in which we have marked out all incompressible places so all other generated instructions could be safely compressed if compressible. Hence the old `CompressibleRegion`s are useless and need a cleanup. > > After this patch there are only two places using `CompressibleRegion`s: `MachNopNode::emit()` and `MacroAssembler::align()`. These `CompressibleRegion`s transforming the nops used for alignment purposes cannot be removed and the nops should be kept as 2-byte when RVC is enabled, for that we are at a 2-byte boundary and want to do `align(4)` with 4-byte nops would be impossible. So under RVC the nops for alignment purposes must be 2-byte ones. Others `CompressibleRegion`s are useless now. > > Has passed hotspot tier1~4 together with other patches; another tier1 is running. Thanks for the fast review! Trivial as this one looks, going to merge it. ------------- PR: https://git.openjdk.org/jdk/pull/10722 From xlinzheng at openjdk.org Tue Oct 18 01:19:48 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 18 Oct 2022 01:19:48 GMT Subject: Integrated: 8295396: RISC-V: Cleanup useless CompressibleRegions In-Reply-To: <8IC71PRQP2ha10s_zyrPU3fsqvCCMXSUFXC5gjS_DUw=.165ce3bb-92b6-4d4f-bfcd-4928c09ea352@github.com> References: <8IC71PRQP2ha10s_zyrPU3fsqvCCMXSUFXC5gjS_DUw=.165ce3bb-92b6-4d4f-bfcd-4928c09ea352@github.com> Message-ID: On Mon, 17 Oct 2022 07:57:00 GMT, Xiaolin Zheng wrote: > Cleanup no longer used old code introduced in the riscv-port repo after #10643, in which we have marked out all incompressible places so all other generated instructions could be safely compressed if compressible. Hence the old `CompressibleRegion`s are useless and need a cleanup. > > After this patch there are only two places using `CompressibleRegion`s: `MachNopNode::emit()` and `MacroAssembler::align()`. These `CompressibleRegion`s transforming the nops used for alignment purposes cannot be removed and the nops should be kept as 2-byte when RVC is enabled, for that we are at a 2-byte boundary and want to do `align(4)` with 4-byte nops would be impossible. So under RVC the nops for alignment purposes must be 2-byte ones. Others `CompressibleRegion`s are useless now. > > Has passed hotspot tier1~4 together with other patches; another tier1 is running. This pull request has now been integrated. Changeset: 529cc48f Author: Xiaolin Zheng Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/529cc48f355523fd162470b416a5081869adcf0e Stats: 68 lines in 2 files changed: 0 ins; 68 del; 0 mod 8295396: RISC-V: Cleanup useless CompressibleRegions Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/10722 From sspitsyn at openjdk.org Tue Oct 18 01:27:57 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 18 Oct 2022 01:27:57 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 Message-ID: The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. The fix is to flush all pending ObjectFree events at the VM shutdown. Testing: All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. ------------- Commit messages: - 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 Changes: https://git.openjdk.org/jdk/pull/10736/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10736&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291456 Stats: 9 lines in 2 files changed: 2 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10736.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10736/head:pull/10736 PR: https://git.openjdk.org/jdk/pull/10736 From cjplummer at openjdk.org Tue Oct 18 01:37:57 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 18 Oct 2022 01:37:57 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. The changes look good. Did you first make sure you could reproduce the problem without the fix, and then verify with the fix? ------------- PR: https://git.openjdk.org/jdk/pull/10736 From xgong at openjdk.org Tue Oct 18 01:44:21 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Oct 2022 01:44:21 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector [v2] In-Reply-To: References: Message-ID: > "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Add the floating point support for VectorLoadConst and remove the VectorCast - Merge branch 'master' into JDK-8293409 - 8293409: [vectorapi] Intrinsify VectorSupport.indexVector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10332/files - new: https://git.openjdk.org/jdk/pull/10332/files/2ad157b6..53f042d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10332&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10332&range=00-01 Stats: 50675 lines in 1239 files changed: 30581 ins; 14395 del; 5699 mod Patch: https://git.openjdk.org/jdk/pull/10332.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10332/head:pull/10332 PR: https://git.openjdk.org/jdk/pull/10332 From xgong at openjdk.org Tue Oct 18 01:44:21 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Oct 2022 01:44:21 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector In-Reply-To: References: Message-ID: <0yv4LhxY5GqaiuhoxdB7tmmJlik-m9B_2BYWkdDCSTU=.0c97a482-164d-4d14-8a3e-8a6b2c3a34c6@github.com> On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong wrote: > "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! Hi @jatin-bhateja , all your comments have been addressed. Please help to look at the changes again! Thanks in advance! ------------- PR: https://git.openjdk.org/jdk/pull/10332 From xgong at openjdk.org Tue Oct 18 01:44:21 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 18 Oct 2022 01:44:21 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector [v2] In-Reply-To: <_wyFWAET_qXwwj-9Iq9AsPAGbT3AXIwN6HujmwZVRPw=.9c652886-4255-4c03-89d9-e3c74f9f319a@github.com> References: <_wyFWAET_qXwwj-9Iq9AsPAGbT3AXIwN6HujmwZVRPw=.9c652886-4255-4c03-89d9-e3c74f9f319a@github.com> Message-ID: On Thu, 13 Oct 2022 07:04:25 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Add the floating point support for VectorLoadConst and remove the VectorCast >> - Merge branch 'master' into JDK-8293409 >> - 8293409: [vectorapi] Intrinsify VectorSupport.indexVector > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2949: > >> 2947: } else if (elem_bt == T_DOUBLE) { >> 2948: iota = gvn().transform(new VectorCastL2XNode(iota, vt)); >> 2949: } > > Since we are loading constants from stub initialized memory locations, defining new stubs for floating point iota indices may eliminate need for costly conversion instructions. Specially on X86 conversion b/w Long and Double is only supported by AVX512DQ targets and intrinsification may fail for legacy targets. Make sense to me! I'v changed the codes based on the suggestion in the latest commit. Please help to take a review again! Thanks a lot! ------------- PR: https://git.openjdk.org/jdk/pull/10332 From sspitsyn at openjdk.org Tue Oct 18 01:47:58 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 18 Oct 2022 01:47:58 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. Yes, I was able to reproduce the original issue without my fix. It is not reproducible anymore with the fix. ------------- PR: https://git.openjdk.org/jdk/pull/10736 From njian at openjdk.org Tue Oct 18 02:01:07 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Tue, 18 Oct 2022 02:01:07 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v2] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 10:12:42 GMT, Bhavana Kilambi wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modified JTREG test to include feature constraints lgtm ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.org/jdk/pull/10407 From dholmes at openjdk.org Tue Oct 18 02:03:10 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Oct 2022 02:03:10 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic For the benefit of the mailing list my previous comment has been edited/corrected: Correction: I was under the false impression the FP-state was shared at the process level but I've been told it is per-thread. So the scope of the problem is much narrower. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From dholmes at openjdk.org Tue Oct 18 02:26:55 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Oct 2022 02:26:55 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. The agent will be racing against VM termination and will need to be resilient to potentially getting "wrong phase" errors, or similar. ------------- PR: https://git.openjdk.org/jdk/pull/10736 From sspitsyn at openjdk.org Tue Oct 18 02:43:03 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 18 Oct 2022 02:43:03 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 02:23:12 GMT, David Holmes wrote: > The agent will be racing against VM termination and will need to be > resilient to potentially getting "wrong phase" errors, or similar. I'm not sure what you mean. There is always this kind of race but this fix does not make it worse. The JDWP agent does all needed to be resistant to the WRONG_PHASE errors. If you talk about all other JVM TI agents then it is a common problem which does not have a solution yet. It means, all JVM TI agents should check for the WRONG_PHASE errors and process them as needed. ------------- PR: https://git.openjdk.org/jdk/pull/10736 From cjplummer at openjdk.org Tue Oct 18 02:48:50 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 18 Oct 2022 02:48:50 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. Marked as reviewed by cjplummer (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10736 From dholmes at openjdk.org Tue Oct 18 04:21:40 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Oct 2022 04:21:40 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 02:39:11 GMT, Serguei Spitsyn wrote: > There is always this kind of race but this fix does not make it any worse. Before this fix events were not getting flushed so at VM exit the agent was idle with respect to those events and the test failed because events were missing. Now you are flushing those events are VM exit and so the agent will now be very active at VM exit, and racing against the rest of the termination sequence. > Please, note, the ObjectFree events in this fix are flushed/posted synchronously before the VMDeath event. Do you mean that before the flush operation returns all events are guaranteed to have been processed? ------------- PR: https://git.openjdk.org/jdk/pull/10736 From sspitsyn at openjdk.org Tue Oct 18 04:28:50 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 18 Oct 2022 04:28:50 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 04:18:33 GMT, David Holmes wrote: > Do you mean that before the flush operation returns all events are guaranteed to have been processed? Exactly. And it happens before the VMDeath event is posted. ------------- PR: https://git.openjdk.org/jdk/pull/10736 From dholmes at openjdk.org Tue Oct 18 05:47:37 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Oct 2022 05:47:37 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. Ah sorry. I was thinking the events were flushed to a queue and the processed later. ------------- PR: https://git.openjdk.org/jdk/pull/10736 From sspitsyn at openjdk.org Tue Oct 18 06:11:07 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 18 Oct 2022 06:11:07 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. No problem. Sorry, it was not clear. ------------- PR: https://git.openjdk.org/jdk/pull/10736 From iklam at openjdk.org Tue Oct 18 06:56:14 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Oct 2022 06:56:14 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v2] In-Reply-To: References: Message-ID: > Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: > > - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. > - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. > > By doing the resolution at dump time, we can speed up run time start-up by a little bit. > > The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - @coleenp comments - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10330/files - new: https://git.openjdk.org/jdk/pull/10330/files/48a56524..9bf8cb4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10330&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10330&range=00-01 Stats: 59174 lines in 1659 files changed: 33804 ins; 16738 del; 8632 mod Patch: https://git.openjdk.org/jdk/pull/10330.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10330/head:pull/10330 PR: https://git.openjdk.org/jdk/pull/10330 From iklam at openjdk.org Tue Oct 18 06:56:15 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Oct 2022 06:56:15 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v2] In-Reply-To: References: Message-ID: <8iNwjxGcRg1uK4X6s7Bk0Sv4qWbYiE_2-Avo3X65xc4=.5583d1df-b49a-419a-aee6-f3f6fffb5b63@github.com> On Mon, 17 Oct 2022 23:02:00 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @coleenp comments >> - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime >> - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time > > src/hotspot/share/cds/classPrelinker.cpp line 48: > >> 46: InstanceKlass* super = ik->java_super(); >> 47: if (super != NULL) { >> 48: add_one_vm_class(super); > > Do you care about order? Would it be better to add the superclasses before the instanceKlass? The order doesn't matter since this is a hashtable. Also, I am trying to avoid walking up the hierarchy if a class is already in the table. > src/hotspot/share/cds/classPrelinker.cpp line 59: > >> 57: ClassPrelinker::ClassPrelinker() { >> 58: assert(_singleton == NULL, "must be"); >> 59: _singleton = this; > > I'm not sure why you have this? I was trying to put the states of the ClassPrelinker in its instance fields, but this is probably confusing. I'll change ClassPrelinker to AllStatic instead. > src/hotspot/share/cds/classPrelinker.cpp line 116: > >> 114: Klass* ClassPrelinker::get_resolved_klass_or_null(ConstantPool* cp, int cp_index) { >> 115: if (cp->tag_at(cp_index).is_klass()) { >> 116: CPKlassSlot kslot = cp->klass_slot_at(cp_index); > > This is another place that CPKlassSlot leaks out. It would be nice if these were in constantPool. Maybe this function belongs in the constant pool. Changed to use ConstantPool::resolved_klass_at() instead. > src/hotspot/share/cds/classPrelinker.cpp line 127: > >> 125: } >> 126: >> 127: bool ClassPrelinker::can_archive_resolved_klass(ConstantPool* cp, int cp_index) { > > Can the other can_archive_resolved_klass be just above this one so we know why their names are the same? Also can_archive_vm_resolved_klass is too closely named. I moved the two `can_archive_resolved_klass()` functions next to each other. I also folded `can_archive_vm_resolved_klass` into `can_archive_resolved_klass(InstanceKlass*, Klass*)`, simplified it and improved the comments. > src/hotspot/share/cds/classPrelinker.cpp line 146: > >> 144: >> 145: bool first_time; >> 146: _processed_classes.put_if_absent(ik, &first_time); > > I don't see any get functions for this hashtable? This table is used to check if we have already worked on the class: bool first_time; _processed_classes.put_if_absent(ik, &first_time); if (!first_time) { return; } > src/hotspot/share/cds/classPrelinker.cpp line 194: > >> 192: CPKlassSlot kslot = cp->klass_slot_at(cp_index); >> 193: int name_index = kslot.name_index(); >> 194: Symbol* name = cp->symbol_at(name_index); > > Isn't this klass_name_at() ? I think these details should stay in constantPool.cpp. I changed it to use klass_name_at(). > src/hotspot/share/cds/classPrelinker.cpp line 205: > >> 203: >> 204: #if INCLUDE_CDS_JAVA_HEAP >> 205: void ClassPrelinker::resolve_string(constantPoolHandle cp, int cp_index, TRAPS) { > > Is this function only needed in this cpp file? Can you make it static and defined before its caller? Are there other functions where this is true? Does it need to be a member function of ClassPrelinker? I prefer to make such methods private, so it can access other private members of the same class. ------------- PR: https://git.openjdk.org/jdk/pull/10330 From duke at openjdk.org Tue Oct 18 07:32:59 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 07:32:59 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent Message-ID: Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. ------------- Commit messages: - Fix store_heap_oop - Fix access_store_at Changes: https://git.openjdk.org/jdk/pull/10739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10739&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295457 Stats: 23 lines in 7 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/10739.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10739/head:pull/10739 PR: https://git.openjdk.org/jdk/pull/10739 From shade at openjdk.org Tue Oct 18 07:32:59 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Oct 2022 07:32:59 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 04:18:13 GMT, Zixian Cai wrote: > Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. > > The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 > > However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 > > This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. Submitted https://bugs.openjdk.org/browse/JDK-8295457 for this PR, please rename it to: "8295457: Make the signatures of write barrier methods consistent" to let bots hook it up. ------------- PR: https://git.openjdk.org/jdk/pull/10739 From duke at openjdk.org Tue Oct 18 07:39:00 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 07:39:00 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 07:26:36 GMT, Aleksey Shipilev wrote: >> Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. >> >> The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 >> >> However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 >> >> This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. > > Submitted https://bugs.openjdk.org/browse/JDK-8295457 for this PR, please rename it to: "8295457: Make the signatures of write barrier methods consistent" to let bots hook it up. @shipilev Thanks, I've just fixed the title. Tests have all passed. ------------- PR: https://git.openjdk.org/jdk/pull/10739 From fweimer at openjdk.org Tue Oct 18 07:49:01 2022 From: fweimer at openjdk.org (Florian Weimer) Date: Tue, 18 Oct 2022 07:49:01 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic I wonder if something that focuses on diagnostic tools might be better here, particularly if there hasn't been any reported breakage. The `dlopen` protection is of course very incomplete because any JNI call can change the state in unexpected ways. On the other hand, it seems unlikely that this change breaks some undefined but intended use of the FPU state because if it is changed in `dlopen`, it's not going to be propagated across threads. src/hotspot/os/linux/os_linux.cpp line 1759: > 1757: assert(rtn == 0, "fegetnv must succeed"); > 1758: void * result = ::dlopen(filename, RTLD_LAZY); > 1759: rtn = fesetenv(&curr_fenv); `fesetenv` in glibc does not unconditionally restore the old FPU control bits, only the non-reserved bits. (The mask is 0xf3f, which seems to cover everything in use today.) It seems unlikely that more bits are going to be defined in the future. But using `fesetenv` to essentially guard against undefined behavior after the fact is a bit awkward. MXSCR is passed through unconditionally. test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 24: > 22: */ > 23: > 24: // This file is intentionally left blank. This test will silently break (no longer test what's intended) once we change GCC not to link with `crtfastmath.o` for `-shared`. Maybe you should link with `crtfastmath.o` explicitly if it exists, or change the floating point control word directly in an ELF constructor. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From tschatzl at openjdk.org Tue Oct 18 08:14:54 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Oct 2022 08:14:54 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 04:18:13 GMT, Zixian Cai wrote: > Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. > > The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 > > However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 > > This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10739 From tschatzl at openjdk.org Tue Oct 18 08:19:09 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Oct 2022 08:19:09 GMT Subject: RFR: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 06:12:49 GMT, Axel Boldt-Christmas wrote: > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293351](https://bugs.openjdk.org/browse/JDK-8293351) the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: linux-aarch64, macosx-aarch64 tier 1-3 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10688 From shade at openjdk.org Tue Oct 18 08:21:04 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Oct 2022 08:21:04 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 04:18:13 GMT, Zixian Cai wrote: > Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. > > The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 > > However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 > > This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. Do you want to check `macroAssembler_ppc.*`, `macroAssembler_arm.*` too? ------------- PR: https://git.openjdk.org/jdk/pull/10739 From luhenry at openjdk.org Tue Oct 18 08:22:22 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 18 Oct 2022 08:22:22 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v2] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <-1yE3vHmtefKP7t2B44mCrTaN7SmBjN97ck03xige_U=.4e5327bc-8fbb-45fc-920d-335e790c0507@github.com> On Sun, 16 Oct 2022 08:44:01 GMT, Vladimir Kempik wrote: >> Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Fix comment >> - Fix alignement >> - Merge branch 'master' of github.com:openjdk/jdk into dev/ludovic/upstream-zicboz >> - 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V >> >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 117: > >> 115: } >> 116: >> 117: _cache_line_size = 64; > > Cache line size can be different on various cpus. Hardcoding it isn't nice. if it can't be reliably retrieved at runtime, maybe allow specifying it via some XX option ? Agreed. I don't think there is an existing API on RISC-V to get the current cache line size. I'll add a `-XX:` flag instead. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From tschatzl at openjdk.org Tue Oct 18 08:26:12 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Oct 2022 08:26:12 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v7] In-Reply-To: References: Message-ID: <-HwZIBxmGBV6qCyB8mWh-Pm3T75BCS9x7zSgVSGVxIc=.84c8eed9-3954-44e8-8af7-73a4513319a4@github.com> On Mon, 17 Oct 2022 11:50:03 GMT, Kim Barrett wrote: >> 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal >> 8155996: Improve concurrent refinement green zone control >> 8134303: Introduce -XX:-G1UseConcRefinement >> >> Please review this change to the control of concurrent refinement. >> >> This new controller takes a different approach to the problem, addressing a >> number of issues. >> >> The old controller used a multiple of the target number of cards to determine >> the range over which increasing numbers of refinement threads should be >> activated, and finally activating mutator refinement. This has a variety of >> problems. It doesn't account for the processing rate, the rate of new dirty >> cards, or the time available to perform the processing. This often leads to >> unnecessary spikes in the number of running refinement threads. It also tends >> to drive the pending number to the target quickly and keep it there, removing >> the benefit from having pending dirty cards filter out new cards for nearby >> writes. It can't delay and leave excess cards in the queue because it could >> be a long time before another buffer is enqueued. >> >> The old controller was triggered by mutator threads enqueing card buffers, >> when the number of cards in the queue exceeded a threshold near the target. >> This required a complex activation protocol between the mutators and the >> refinement threads. >> >> With the new controller there is a primary refinement thread that periodically >> estimates how many refinement threads need to be running to reach the target >> in time for the next GC, along with whether to also activate mutator >> refinement. If the primary thread stops running because it isn't currently >> needed, it sleeps for a period and reevaluates on wakeup. This eliminates any >> involvement in the activation of refinement threads by mutator threads. >> >> The estimate of how many refinement threads are needed uses a prediction of >> time until the next GC, the number of buffered cards, the predicted rate of >> new dirty cards, and the predicted refinement rate. The number of running >> threads is adjusted based on these periodically performed estimates. >> >> This new approach allows more dirty cards to be left in the queue until late >> in the mutator phase, typically reducing the rate of new dirty cards, which >> reduces the amount of concurrent refinement work needed. >> >> It also smooths out the number of running refinement threads, eliminating the >> unnecessarily large spikes that are common with the old method. One benefit >> is that the number of refinement threads (lazily) allocated is often much >> lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem >> described in JDK-8153225.) >> >> This change also provides a new method for calculating for the number of dirty >> cards that should be pending at the start of a GC. While this calculation is >> conceptually distinct from the thread control, the two were significanly >> intertwined in the old controller. Changing this calculation separately and >> first would have changed the behavior of the old controller in ways that might >> have introduced regressions. Changing it after the thread control was changed >> would have made it more difficult to test and measure the thread control in a >> desirable configuration. >> >> The old calculation had various problems that are described in JDK-8155996. >> In particular, it can get more or less stuck at low values, and is slow to >> respond to changes. >> >> The old controller provided a number of product options, none of which were >> very useful for real applications, and none of which are very applicable to >> the new controller. All of these are being obsoleted. >> >> -XX:-G1UseAdaptiveConcRefinement >> -XX:G1ConcRefinementGreenZone= >> -XX:G1ConcRefinementYellowZone= >> -XX:G1ConcRefinementRedZone= >> -XX:G1ConcRefinementThresholdStep= >> >> The new controller *could* use G1ConcRefinementGreenZone to provide a fixed >> value for the target number of cards, though it is poorly named for that. >> >> A configuration that was useful for some kinds of debugging and testing was to >> disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a >> very large value, effectively disabling concurrent refinement. To support >> this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic >> option has been added (see JDK-8155996). >> >> The other options are meaningless for the new controller. >> >> Because of these option changes, a CSR and a release note need to accompany >> this change. >> >> Testing: >> mach5 tier1-6 >> various performance tests. >> local (linux-x64) tier1 with -XX:-G1UseConcRefinement >> >> Performance testing found no regressions, but also little or no improvement >> with default options, which was expected. With default options most of our >> performance tests do very little concurrent refinement. And even for those >> that do, while the old controller had a number of problems, the impact of >> those problems is small and hard to measure for most applications. >> >> When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare >> better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with >> MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options >> held constant) showed a statistically significant improvement of about 4.5% >> for critical-jOPS. Using the changed controller, the difference between this >> configuration and the default is fairly small, while the baseline shows >> significant degradation with the more restrictive options. >> >> For all tests and configurations the new controller often creates many fewer >> refinement threads. > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - more copyright updates > - tschatzl comments Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10256 From duke at openjdk.org Tue Oct 18 08:30:59 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 08:30:59 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: <78U7i0N24dQPR_dUY-7tfE1hF7LjN4mBl3OFBONlWtE=.c2e4f83f-d6e6-4474-8e5e-c7abec53714d@github.com> On Tue, 18 Oct 2022 08:18:56 GMT, Aleksey Shipilev wrote: > Do you want to check `macroAssembler_ppc.*`, `macroAssembler_arm.*` too? `ppc` methods are in `src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp` and are fixed in this PR. `arm` methods are already correct. Please see below. Thanks. https://github.com/openjdk/jdk/blob/71aa8210910dbafe30eccc772eaa7747f46be0cd/src/hotspot/cpu/arm/macroAssembler_arm.hpp#L852 https://github.com/openjdk/jdk/blob/71aa8210910dbafe30eccc772eaa7747f46be0cd/src/hotspot/cpu/arm/macroAssembler_arm.hpp#L848 ------------- PR: https://git.openjdk.org/jdk/pull/10739 From aph at openjdk.org Tue Oct 18 08:42:08 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 18 Oct 2022 08:42:08 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <8t170OCYrgUss1r0NNupyGFe3wD9WXRVbhHXAwi1YBw=.bd10d95f-a87c-46e3-942c-11665041dd32@github.com> On Tue, 18 Oct 2022 07:43:13 GMT, Florian Weimer wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic > > src/hotspot/os/linux/os_linux.cpp line 1759: > >> 1757: assert(rtn == 0, "fegetnv must succeed"); >> 1758: void * result = ::dlopen(filename, RTLD_LAZY); >> 1759: rtn = fesetenv(&curr_fenv); > > `fesetenv` in glibc does not unconditionally restore the old FPU control bits, only the non-reserved bits. (The mask is 0xf3f, which seems to cover everything in use today.) It seems unlikely that more bits are going to be defined in the future. But using `fesetenv` to essentially guard against undefined behavior after the fact is a bit awkward. > > MXSCR is passed through unconditionally. Mmm, but `fesetenv` is the only portable thing we have (AFAIK). An alternative to using `fesetenv` everywhere would be to use `fesetenv` in generic code and override it in a per-backend way. > test/hotspot/jtreg/compiler/floatingpoint/libfast-math.c line 24: > >> 22: */ >> 23: >> 24: // This file is intentionally left blank. > > This test will silently break (no longer test what's intended) once we change GCC not to link with `crtfastmath.o` for `-shared`. Maybe you should link with `crtfastmath.o` explicitly if it exists, or change the floating point control word directly in an ELF constructor. Perhaps so, yes. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From shade at openjdk.org Tue Oct 18 08:42:57 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Oct 2022 08:42:57 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 04:18:13 GMT, Zixian Cai wrote: > Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. > > The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 > > However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 > > This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. All right then! ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/10739 From luhenry at openjdk.org Tue Oct 18 08:50:33 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 18 Oct 2022 08:50:33 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v3] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - fixup! Add -XX:CacheLineSize= to set cache line size - Add -XX:CacheLineSize= to set cache line size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/845d0e3b..ca8c3ac5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=01-02 Stats: 18 lines in 7 files changed: 5 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From aboldtch at openjdk.org Tue Oct 18 09:01:10 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Oct 2022 09:01:10 GMT Subject: Integrated: 8295273: Remove unused argument in [load/store]_sized_value on aarch64 and riscv In-Reply-To: References: Message-ID: <7WbIJUa_ofeCeOGyZ9gmSw1qXPf6jab3ESWeTkbsG6s=.66cf404e-1a54-40af-83f6-8f8f6ecbb4c5@github.com> On Thu, 13 Oct 2022 12:29:39 GMT, Axel Boldt-Christmas wrote: > Remove the unused argument `Register [dst2/src2]` in [load/store]_sized_value on aarch64 and riscv ports. > > Seems like they were brought in in the initial ports [JDK-8068054](https://bugs.openjdk.org/browse/JDK-8068054) [JDK-8276799](https://bugs.openjdk.org/browse/JDK-8276799) as just straight signature copies of x86. The second register is only required on x86 for x86_32 support and is unused on riscv and aarch64. > > Should be a trivial removal. > > Testing: Cross-compiled for riscv and aarch64. Waiting for GHA This pull request has now been integrated. Changeset: 6553065c Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/6553065cab9ecb14390da8ec34e49aba940b213f Stats: 8 lines in 4 files changed: 0 ins; 0 del; 8 mod 8295273: Remove unused argument in [load/store]_sized_value on aarch64 and riscv Reviewed-by: fyang, haosun ------------- PR: https://git.openjdk.org/jdk/pull/10698 From aboldtch at openjdk.org Tue Oct 18 09:02:10 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Oct 2022 09:02:10 GMT Subject: Integrated: 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 06:12:49 GMT, Axel Boldt-Christmas wrote: > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293351](https://bugs.openjdk.org/browse/JDK-8293351) the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: linux-aarch64, macosx-aarch64 tier 1-3 This pull request has now been integrated. Changeset: a8c18ebc Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/a8c18ebc152842281b22534507b4a09612ea3498 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod 8295257: Remove implicit noreg temp register arguments in aarch64 MacroAssembler Reviewed-by: aph, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/10688 From luhenry at openjdk.org Tue Oct 18 09:03:36 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 18 Oct 2022 09:03:36 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v4] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! Add -XX:CacheLineSize= to set cache line size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/ca8c3ac5..71be1b6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=02-03 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From duke at openjdk.org Tue Oct 18 09:07:54 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 09:07:54 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent [v2] In-Reply-To: References: Message-ID: > Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. > > The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 > > However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 > > This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. Zixian Cai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge remote-tracking branch 'origin/master' into store_heap_oop - Fix store_heap_oop - Fix access_store_at ------------- Changes: https://git.openjdk.org/jdk/pull/10739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10739&range=01 Stats: 23 lines in 7 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/10739.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10739/head:pull/10739 PR: https://git.openjdk.org/jdk/pull/10739 From luhenry at openjdk.org Tue Oct 18 09:17:09 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 18 Oct 2022 09:17:09 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v5] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: fixup! Add -XX:CacheLineSize= to set cache line size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/71be1b6c..de0f1a28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From yadongwang at openjdk.org Tue Oct 18 09:39:06 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 18 Oct 2022 09:39:06 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v4] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <24CtJ7csuki9OSb8x1wM-7Rn3-4BOmXfUVok0RkAq08=.77fbd1a6-0654-4853-9142-021b960510c8@github.com> On Tue, 18 Oct 2022 09:03:36 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > fixup! Add -XX:CacheLineSize= to set cache line size src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4126: > 4124: srai(t1, t0, 3); > 4125: sub(cnt, cnt, t1); > 4126: add(t2, zr, zr); The usage of temporary registers needs to be made known to C2. You'd better pass arguments in and add effect in the ad file. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From duke at openjdk.org Tue Oct 18 09:49:58 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 09:49:58 GMT Subject: RFR: 8295457: Make the signatures of write barrier methods consistent [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 09:07:54 GMT, Zixian Cai wrote: >> Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. >> >> The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 >> >> However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 >> >> https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 >> >> This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. > > Zixian Cai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge remote-tracking branch 'origin/master' into store_heap_oop > - Fix store_heap_oop > - Fix access_store_at Not sure why the bot was complaining about rebasing. I merged the latest commits from master. ------------- PR: https://git.openjdk.org/jdk/pull/10739 From shade at openjdk.org Tue Oct 18 10:56:38 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Oct 2022 10:56:38 GMT Subject: RFR: 8295468: RISC-V: Minimal builds are broken Message-ID: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> Attempting to build RISC-V "minimal" variant fails with: * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: In file included from /home/shade/trunks/jdk/src/hotspot/share/utilities/globalDefinitions.hpp:29, from /home/shade/trunks/jdk/src/hotspot/share/memory/allocation.hpp:29, from /home/shade/trunks/jdk/src/hotspot/share/memory/arena.hpp:28, from /home/shade/trunks/jdk/src/hotspot/share/runtime/handles.hpp:28, from /home/shade/trunks/jdk/src/hotspot/share/code/oopRecorder.hpp:28, from /home/shade/trunks/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, from /home/shade/trunks/jdk/src/hotspot/share/asm/assembler.hpp:28, from /home/shade/trunks/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:28: ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/10742/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10742&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295468 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10742.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10742/head:pull/10742 PR: https://git.openjdk.org/jdk/pull/10742 From mgronlun at openjdk.org Tue Oct 18 11:15:02 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Oct 2022 11:15:02 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: <1XyiaRUBadonxQ_XjryHn5duKVmXy931G9QleAH0iDA=.da2ccc52-50ab-491e-bc1e-7e5c133f6fcf@github.com> References: <1XyiaRUBadonxQ_XjryHn5duKVmXy931G9QleAH0iDA=.da2ccc52-50ab-491e-bc1e-7e5c133f6fcf@github.com> Message-ID: On Tue, 18 Oct 2022 00:50:42 GMT, David Holmes wrote: >> There exist a general exclusion/inclusion mechanism already. But it is an all-or-nothing proposition. This particular case is a thread that we can't exclude because it runs the periodic events, upon being notified. It is the notification mechanism to run the periodic events that trigger this large amount of unnecessary MonitorWait events. Even should we change it to some util.concurrent construct, we are only pushing the problem, because we might be instrumenting them later too. To work with the existing exclusion mechanism, the system would have to introduce an additional thread, which will be excluded, which only handles the notification, and then by some other means triggers another periodic thread (included) to run the periodic events. > > @mgronlun my request is that this filtering be done inside the commit logic by the JFR code, not at the site where the event is generated - ie this internal-jfr-event filtering is internalized into the JFR code. @dholmes-ora I agree that would be preferable indeed. Unfortunately, it cannot be done without introducing overhead to all event types and all event commit sites. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From jsjolen at openjdk.org Tue Oct 18 11:48:09 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Oct 2022 11:48:09 GMT Subject: RFR: 8291714: Implement a Multi-Reader Single-Writer mutex for Hotspot [v4] In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 21:41:46 GMT, Kim Barrett wrote: >> Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Fix tests for updated threadHelper >> - Remove outdated threadHelper >> - Update documentation >> - Fix outdated headers and remove dead code >> - Review comments >> - Implement MRSWMutex > > src/hotspot/share/utilities/readWriteLock.cpp line 139: > >> 137: return; >> 138: } >> 139: } > > I was wondering why a lock for write doesn't hold the underlying > PlatformMonitor across the scope of the mutex region. Something like this > (completely untested) > > > void ReadWriteLock::write_lock(Thread* current) { > _mon.lock(); > while (true) { > const int32_t count = Atomic::load_acquire(&_count); > if (count < 0) { > // Some other writer is waiting for readers to complete. > // Approx: > // while (Atomic::load_acquire(&_count)) _mon.wait(); > await_write_unlock(current); > } else if (Atomic::cmpxchg(&_count, count, -(count + 1)) == count) { > // Claimed the write slot, but there might be active readers. > if (count != 0) { > // Approx: > // while (Atomic::load_acquire(&_count) != -1) _mon.wait(); > await_no_active_readers(current); > } > return; // return still holding lock > } // else failed to claim write ownership, so try again. > } > } > > void ReadWriteLock::write_unlock() { > assert(Atomic::load(&_count) == -1, "invariant"); > Atomic::release_store(&_count, (int32_t)0); > _mon.notify_all(); > _mon.unlock(); > } > > > This seems simpler to me. Am I missing something? > > And looking at this also makes me wonder if `_mon` can just be a normal > HotSpot `Monitor`, which would resolve many of the questions and comments I've > made that are mostly driven by the direct use of `PlatformMonitor`. This lock is essentially unbounded, are there any implications regarding stopping the VM from going into safepoints? I assume not, if we're using `Monitor`. ------------- PR: https://git.openjdk.org/jdk/pull/9838 From stuefe at openjdk.org Tue Oct 18 12:06:17 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 18 Oct 2022 12:06:17 GMT Subject: RFR: 8295468: RISC-V: Minimal builds are broken In-Reply-To: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> References: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> Message-ID: On Tue, 18 Oct 2022 10:48:45 GMT, Aleksey Shipilev wrote: > Attempting to build RISC-V "minimal" variant fails with: > > > * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: > In file included from /home/shade/trunks/jdk/src/hotspot/share/utilities/globalDefinitions.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/allocation.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/arena.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/runtime/handles.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/code/oopRecorder.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/assembler.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:28: +1 trivial ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/10742 From jnordstrom at openjdk.org Tue Oct 18 12:16:00 2022 From: jnordstrom at openjdk.org (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Tue, 18 Oct 2022 12:16:00 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events Would it be possible to add this filtering as part of the throttling mechanism? I guess it would push the meaning of throttling, and might be overkill for this specific use-case, but perhaps some new internal throttle-like approach could be implemented? I'll do some experimentation and see what I might come up with. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From luhenry at openjdk.org Tue Oct 18 12:44:18 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 18 Oct 2022 12:44:18 GMT Subject: RFR: 8295468: RISC-V: Minimal builds are broken In-Reply-To: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> References: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> Message-ID: On Tue, 18 Oct 2022 10:48:45 GMT, Aleksey Shipilev wrote: > Attempting to build RISC-V "minimal" variant fails with: > > > * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: > In file included from /home/shade/trunks/jdk/src/hotspot/share/utilities/globalDefinitions.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/allocation.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/arena.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/runtime/handles.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/code/oopRecorder.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/assembler.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:28: Marked as reviewed by luhenry (Author). ------------- PR: https://git.openjdk.org/jdk/pull/10742 From luhenry at openjdk.org Tue Oct 18 12:50:52 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 18 Oct 2022 12:50:52 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v4] In-Reply-To: <24CtJ7csuki9OSb8x1wM-7Rn3-4BOmXfUVok0RkAq08=.77fbd1a6-0654-4853-9142-021b960510c8@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <24CtJ7csuki9OSb8x1wM-7Rn3-4BOmXfUVok0RkAq08=.77fbd1a6-0654-4853-9142-021b960510c8@github.com> Message-ID: On Tue, 18 Oct 2022 09:35:25 GMT, Yadong Wang wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup! Add -XX:CacheLineSize= to set cache line size > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4126: > >> 4124: srai(t1, t0, 3); >> 4125: sub(cnt, cnt, t1); >> 4126: add(t2, zr, zr); > > The usage of temporary registers needs to be made known to C2. You'd better pass arguments in and add effect in the ad file. Given it's only made to be called from `StubRoutine::zero_blocks` stub routine and `t0-t1` are temporary registers and `t2` (aka `x7`) is caller-saved, I don't understand why it needs to be made aware for C2? I'll add them as `tmp0-tmp3` arguments to `MacroAssembler::dcache_zero_blocks` to make sure any future caller of this will be aware. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From stefank at openjdk.org Tue Oct 18 13:04:57 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 18 Oct 2022 13:04:57 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj Message-ID: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Background to this patch: This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. PR RFC: HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: MetaspaceObj - allocates in the Metaspace CHeap - uses malloc ResourceObj - ... The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. This is IMHO misleading, and often leads to confusion among HotSpot developers. I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. ------------- Commit messages: - Remove AnyObj new operator taking an allocation_type - Use more specific allocation types Changes: https://git.openjdk.org/jdk/pull/10745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10745&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295475 Stats: 458 lines in 152 files changed: 67 ins; 37 del; 354 mod Patch: https://git.openjdk.org/jdk/pull/10745.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10745/head:pull/10745 PR: https://git.openjdk.org/jdk/pull/10745 From yadongwang at openjdk.org Tue Oct 18 13:13:00 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Tue, 18 Oct 2022 13:13:00 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v4] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <24CtJ7csuki9OSb8x1wM-7Rn3-4BOmXfUVok0RkAq08=.77fbd1a6-0654-4853-9142-021b960510c8@github.com> Message-ID: <_-asW3wRumLQ-iYayP1ULZH619Y8Upg_gjNjgUMnZAY=.8510e2dc-f782-4f13-9a89-3cc510e7870f@github.com> On Tue, 18 Oct 2022 12:46:33 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4126: >> >>> 4124: srai(t1, t0, 3); >>> 4125: sub(cnt, cnt, t1); >>> 4126: add(t2, zr, zr); >> >> The usage of temporary registers needs to be made known to C2. You'd better pass arguments in and add effect in the ad file. > > Given it's only made to be called from `StubRoutine::zero_blocks` stub routine and `t0-t1` are temporary registers and `t2` (aka `x7`) is caller-saved, I don't understand why it needs to be made aware for C2? > > I'll add them as `tmp0-tmp3` arguments to `MacroAssembler::dcache_zero_blocks` to make sure any future caller of this will be aware. C2 generates ClearArrayNodes, which emit zero_words -> zero_blocks directly. I don't find any caller-saving logic there. t0 is free to use (not participating in register allocation), but t1 is used as condition code in C2. base and cnt registers can be colloberred safely because they were indentified as USE_KILL. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From jsjolen at openjdk.org Tue Oct 18 13:21:28 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Oct 2022 13:21:28 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10645/files - new: https://git.openjdk.org/jdk/pull/10645/files/1902d3f0..46429fb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=00-01 Stats: 66 lines in 4 files changed: 29 ins; 30 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10645.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10645/head:pull/10645 PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Tue Oct 18 13:21:29 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Oct 2022 13:21:29 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 13:18:05 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix review comments Finished up the comments that David left, thanks for the review! ------------- PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Tue Oct 18 13:21:29 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Oct 2022 13:21:29 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL In-Reply-To: References: Message-ID: <93XWx5DekmtJGVFjizGpyR5elvbplVHqsAITJeLc8yc=.24a57a21-269c-454e-964f-d9aa2ccb5028@github.com> On Tue, 11 Oct 2022 09:21:46 GMT, Johan Sj?len wrote: > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. Okay, Github's UI tricked me a bit here. I thought my comments would be applied as replies to David's comments, but they're separate... ------------- PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Tue Oct 18 13:21:30 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Oct 2022 13:21:30 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> References: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> Message-ID: <2Yp4Lel70VRt7OpBxukC9lYzZswJjvulX3bTVyezciA=.b1d71eb2-9f08-4801-953d-e2103a5c3316@github.com> On Tue, 11 Oct 2022 23:53:10 GMT, David Holmes wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix review comments > > src/hotspot/share/runtime/deoptimization.cpp line 441: > >> 439: if (trap_scope->rethrow_exception()) { >> 440: #ifndef PRODUCT >> 441: log_debug(deoptimization)("Exception to be rethrown in the interpreter for method %s::%s at bci %d", trap_scope->method()->method_holder()->name()->as_C_string(), trap_scope->method()->name()->as_C_string(), trap_scope->bci()); > > While you are here could you break this up into three lines please. Fixed. > src/hotspot/share/runtime/deoptimization.cpp line 1472: > >> 1470: { >> 1471: LogMessage(deoptimization) lm; >> 1472: if (lm.is_debug()) { > > Should this be trace level to match the fact you also needed Verbose before? It should be, fixed. > src/hotspot/share/runtime/vframe.cpp line 681: > >> 679: #ifndef PRODUCT >> 680: void vframe::print() { >> 681: if (WizardMode) print_on(tty); > > You don't need the `WizardMode` guard here and in `print_on`. UL is supposed replace WizardMode and Verbose so the correct fix would be to elide it from `print_on` and only log using `print_on` when the logging level is logically equivalent to `WizardMode` (as you have done elsewhere). This sounds OK to me. > src/hotspot/share/runtime/vframeArray.cpp line 380: > >> 378: #ifndef PRODUCT >> 379: log_debug(deoptimization)("Locals size: %d", locals()->size()); >> 380: auto log_it = [](int i, intptr_t* addr) { > > Again I don't see why this can't just be inline Fixed > test/hotspot/gtest/logging/test_logStream.cpp line 158: > >> 156: EXPECT_TRUE(file_contains_substring(TestLogFileName, "ABCD\n")); >> 157: } >> 158: > > Inadvertent removal? It's the last line, so it's a spurious newline to me. > test/hotspot/jtreg/compiler/uncommontrap/TestDeoptOOM.java line 45: > >> 43: * -XX:CompileCommand=exclude,compiler.uncommontrap.TestDeoptOOM::m9_1 >> 44: * -XX:+UnlockDiagnosticVMOptions >> 45: * -XX:+UseZGC -XX:+LogCompilation -XX:+TraceDeoptimization -XX:+Verbose > > Why isn't this enabling deoptimization logging? Indeed, I added it as deoptimization=trace as it has verbose turned on. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Tue Oct 18 13:21:30 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Oct 2022 13:21:30 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: <-gHZi1XpPZBoUl_RS2ETKpgODYvyiUvz5hi2AQlMoDQ=.d0a31f28-04e4-4ae9-b8f4-76415a154853@github.com> Message-ID: <8Br3Z5M-GPdcm858TaD0GgFmvK-hyCyPMVU_DemczdI=.83e1ba0b-7edc-4492-9025-27040959df3b@github.com> On Thu, 13 Oct 2022 02:24:40 GMT, David Holmes wrote: >> I should've said "crosses initialization error", not "crossing." >> >> Check out this SO question: https://stackoverflow.com/questions/11578936/getting-a-bunch-of-crosses-initialization-error >> >> So `LogTarget(Debug, deoptimization) lt;` is the error here. I *think* that we can inline it if we introduce a surrounding scope, is that preferable to you? >> >> So: >> >> >> case T_OBJECT: >> *addr = value->get_int(T_OBJECT); >> { // Scope off LogTarget >> LogTarget(Debug, deoptimization) lt; >> if (lt.is_enabled()) { >> LogStream ls(lt); >> ls.print(" - Reconstructed expression %d (OBJECT): ", i); >> oop o = cast_to_oop((address)(*addr)); >> if (o == NULL) { >> ls.print_cr("NULL"); >> } else { >> ResourceMark rm; >> ls.print_raw_cr(o->klass()->name()->as_C_string()); >> } >> } >> } > > Yes adding the scope is fine and preferable. Thanks. Fixed ------------- PR: https://git.openjdk.org/jdk/pull/10645 From stefank at openjdk.org Tue Oct 18 13:42:40 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 18 Oct 2022 13:42:40 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: > Background to this patch: > > This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. > > PR RFC: > > HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: > > MetaspaceObj - allocates in the Metaspace > CHeap - uses malloc > ResourceObj - ... > > The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. > > This is IMHO misleading, and often leads to confusion among HotSpot developers. > > I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. > > In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. > > The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. > > The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Fix Shenandoah ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10745/files - new: https://git.openjdk.org/jdk/pull/10745/files/bafa0229..4e8ac797 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10745&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10745&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10745.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10745/head:pull/10745 PR: https://git.openjdk.org/jdk/pull/10745 From mgronlun at openjdk.org Tue Oct 18 14:02:01 2022 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 18 Oct 2022 14:02:01 GMT Subject: RFR: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing In-Reply-To: References: Message-ID: On Mon, 17 Oct 2022 09:25:57 GMT, Erik Gahlin wrote: > Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. > > The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. > > TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. > > Testing: tier1-3 + test/jdk/jdk/jfr > > Thanks > Erik Marked as reviewed by mgronlun (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10723 From stuefe at openjdk.org Tue Oct 18 14:44:00 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 18 Oct 2022 14:44:00 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix Shenandoah I like this in principle, I always found the "and also C-heap" usage of ResourceObj odd. But deciding allocation area via inheritance feels a bit restrictive. Since I then cannot have instances of a class live in different areas, I need to decide at class design time where it will live. Which also has implications for allowed lifetime (e.g. no RA before VM initialization). In practice, it never was a big deal, since everything can live on the stack or via composition in other objects, and ResourceObj can live on the C-heap too. But with the proposed patch ResourceObj would only live on RA, and cannot live on the C-heap nor in hand-created Arenas. I admit I have no good solution either. We could make allocation more flexible by accumulating all operator new variants in just a single base class, AnyObj. The problem is that we'd need to track the allocation type at runtime. So we'd trade performance for flexibility. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From fyang at openjdk.org Tue Oct 18 15:00:00 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Oct 2022 15:00:00 GMT Subject: RFR: 8295468: RISC-V: Minimal builds are broken In-Reply-To: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> References: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> Message-ID: <30L9nXmTJN74oDxME534zOdQ3sLLlr1ZU5dCTz9T8Uk=.c797a23f-b112-48a4-b55d-48e6c09ab91c@github.com> On Tue, 18 Oct 2022 10:48:45 GMT, Aleksey Shipilev wrote: > Attempting to build RISC-V "minimal" variant fails with: > > > * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: > In file included from /home/shade/trunks/jdk/src/hotspot/share/utilities/globalDefinitions.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/allocation.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/arena.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/runtime/handles.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/code/oopRecorder.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/assembler.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:28: I see this problem only menifest itself in a minimal debug build. I just realized that I am always doing minimal release build and that's why I missed it. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10742 From shade at openjdk.org Tue Oct 18 15:10:36 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Oct 2022 15:10:36 GMT Subject: RFR: 8295468: RISC-V: Minimal builds are broken In-Reply-To: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> References: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> Message-ID: On Tue, 18 Oct 2022 10:48:45 GMT, Aleksey Shipilev wrote: > Attempting to build RISC-V "minimal" variant fails with: > > > * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: > In file included from /home/shade/trunks/jdk/src/hotspot/share/utilities/globalDefinitions.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/allocation.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/arena.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/runtime/handles.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/code/oopRecorder.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/assembler.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:28: Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10742 From shade at openjdk.org Tue Oct 18 15:10:37 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Oct 2022 15:10:37 GMT Subject: Integrated: 8295468: RISC-V: Minimal builds are broken In-Reply-To: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> References: <-EhZeZe4smW876ADdrvXuqg9tPbRR0zlM9yw8uNF44Y=.c6500fa8-49f6-4ec0-97a5-d81be767c914@github.com> Message-ID: On Tue, 18 Oct 2022 10:48:45 GMT, Aleksey Shipilev wrote: > Attempting to build RISC-V "minimal" variant fails with: > > > * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: > In file included from /home/shade/trunks/jdk/src/hotspot/share/utilities/globalDefinitions.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/allocation.hpp:29, > from /home/shade/trunks/jdk/src/hotspot/share/memory/arena.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/runtime/handles.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/code/oopRecorder.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/share/asm/assembler.hpp:28, > from /home/shade/trunks/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:28: This pull request has now been integrated. Changeset: e7375f9c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e7375f9c527fd86dc1414a308a440903fb9f22da Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod 8295468: RISC-V: Minimal builds are broken Reviewed-by: stuefe, luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/10742 From chagedorn at openjdk.org Tue Oct 18 15:24:04 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 18 Oct 2022 15:24:04 GMT Subject: RFR: 8293422: DWARF emitted by Clang cannot be parsed [v4] In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 08:18:08 GMT, Christian Hagedorn wrote: >> The DWARF debugging symbols emitted by Clang is different from what GCC is emitting. While GCC produces a complete `.debug_aranges` section (which is required in the DWARF parser), Clang does not. As a result, the DWARF parser cannot find the necessary information to proceed and create the line number information: >> >> The `.debug_aranges` section contains address range to compilation unit offset mappings. The parsing algorithm can just walk through all these entries to find the correct address range that contains the library offset of the current pc. This gives us the compilation unit offset into the `.debug_info` section from where we can proceed to parse the line number information. >> >> Without a complete `.debug_aranges` section, we fail with an assertion that we could not find the correct entry. Since [JDK-8293402](https://bugs.openjdk.org/browse/JDK-8293402), we will still get the complete stack trace at least. Nevertheless, we should still fix this assertion failure of course. But that would require a different parsing approach. We need to parse the entire `.debug_info` section instead to get to the correct compilation unit. This, however, would require a lot more work. >> >> I therefore suggest to disable DWARF parsing for Clang for now and file an RFE to support Clang in the future with a different parsing approach. I'm using the `__clang__` `ifdef` to bail out in `get_source_info()` and disable the `gtests`. I've noticed that we are currently running the `gtests` with `NOT PRODUCT` which I think is not necessary - the gtests should also work fine with product builds. I've corrected this as well but that could also be done separately. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Always read full filename and strip prefix path and only then cut filename to fit output buffer > - Merge branch 'master' into JDK-8293422 > - Merge branch 'master' into JDK-8293422 > - Review comments from Thomas > - Change old bailout fix to only apply to Clang versions older than 5.0 and add new fix with -gdwarf-aranges + -gdwarf-4 for Clang 5.0+ > - 8293422: DWARF emitted by Clang cannot be parsed Thanks Magnus for your review of the build changes! May a get a second review of the DWARF parser code changes? Thanks, Christian ------------- PR: https://git.openjdk.org/jdk/pull/10287 From duke at openjdk.org Tue Oct 18 15:35:04 2022 From: duke at openjdk.org (Zixian Cai) Date: Tue, 18 Oct 2022 15:35:04 GMT Subject: Integrated: 8295457: Make the signatures of write barrier methods consistent In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 04:18:13 GMT, Zixian Cai wrote: > Currently, the signatures for various write barrier related methods are inconsistent and can be a bit confusing. Let's take x86 as an example. > > The `store_at` of `BarrierSetAssembler` uses `dst` and `val`, and similarly for `oop_store_at` of `CardTableBarrierSetAssembler`. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.hpp#L50 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.hpp#L38 > > However, `access_store_at` and `store_heap_oop` of `MacroAssembler` use `dst` and `src`, presumably copied from `access_load_at` and `load_heap_oop` respectively. > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L355 > > https://github.com/openjdk/jdk/blob/358ac07255cc640cbcb9b0df5302d97891a34087/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L362 > > This PR cleans up the signature of write barrier methods across affected architectures, hopefully making them less confusing. This pull request has now been integrated. Changeset: 5dbd4951 Author: Zixian Cai Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/5dbd49511518819acbbff9968cdf426af759cf2c Stats: 23 lines in 7 files changed: 0 ins; 0 del; 23 mod 8295457: Make the signatures of write barrier methods consistent Reviewed-by: tschatzl, shade ------------- PR: https://git.openjdk.org/jdk/pull/10739 From iklam at openjdk.org Tue Oct 18 15:39:08 2022 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 18 Oct 2022 15:39:08 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v3] In-Reply-To: References: Message-ID: > Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: > > - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. > - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. > > By doing the resolution at dump time, we can speed up run time start-up by a little bit. > > The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed product build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10330/files - new: https://git.openjdk.org/jdk/pull/10330/files/9bf8cb4c..980802d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10330&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10330&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10330.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10330/head:pull/10330 PR: https://git.openjdk.org/jdk/pull/10330 From ihse at openjdk.org Tue Oct 18 15:45:01 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 18 Oct 2022 15:45:01 GMT Subject: RFR: 8295470: Update openjdk.java.net => openjdk.org URLs in test code Message-ID: This is a continuation of the effort to update all our URLs to the new top-level domain. This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. ------------- Commit messages: - 8295470: Update openjdk.java.net => openjdk.org URLs in test code Changes: https://git.openjdk.org/jdk/pull/10744/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10744&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295470 Stats: 138 lines in 45 files changed: 46 ins; 0 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/10744.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10744/head:pull/10744 PR: https://git.openjdk.org/jdk/pull/10744 From ihse at openjdk.org Tue Oct 18 15:45:02 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 18 Oct 2022 15:45:02 GMT Subject: RFR: 8295470: Update openjdk.java.net => openjdk.org URLs in test code In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 11:55:06 GMT, Magnus Ihse Bursie wrote: > This is a continuation of the effort to update all our URLs to the new top-level domain. > > This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. > > I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. I noticed that a couple of files where missing copyright headers, when I tried to update those. I did some code archaeology and found out which year they were created, and added a copyright header. Just to be reasonably sure that I did not affect any tests, I have let this go through the GHA and our internal CI testing tier1-3. The failed GHA test is `SuperWaitTest`, which I did not modify. ------------- PR: https://git.openjdk.org/jdk/pull/10744 From michaelm at openjdk.org Tue Oct 18 16:03:57 2022 From: michaelm at openjdk.org (Michael McMahon) Date: Tue, 18 Oct 2022 16:03:57 GMT Subject: RFR: 8295470: Update openjdk.java.net => openjdk.org URLs in test code In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 11:55:06 GMT, Magnus Ihse Bursie wrote: > This is a continuation of the effort to update all our URLs to the new top-level domain. > > This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. > > I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. net changes look fine ------------- Marked as reviewed by michaelm (Reviewer). PR: https://git.openjdk.org/jdk/pull/10744 From prr at openjdk.org Tue Oct 18 16:51:10 2022 From: prr at openjdk.org (Phil Race) Date: Tue, 18 Oct 2022 16:51:10 GMT Subject: RFR: 8295470: Update openjdk.java.net => openjdk.org URLs in test code In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 11:55:06 GMT, Magnus Ihse Bursie wrote: > This is a continuation of the effort to update all our URLs to the new top-level domain. > > This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. > > I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. Marked as reviewed by prr (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10744 From kvn at openjdk.org Tue Oct 18 17:10:16 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Oct 2022 17:10:16 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 13:21:28 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix review comments Nice work. Can you show an example of new output vs old? test/hotspot/jtreg/compiler/uncommontrap/TestDeoptOOM.java line 45: > 43: * -XX:CompileCommand=exclude,compiler.uncommontrap.TestDeoptOOM::m9_1 > 44: * -XX:+UnlockDiagnosticVMOptions > 45: * -XX:+UseZGC -XX:+LogCompilation -Xlog:disable -Xlog:deoptimization=trace -XX:+TraceDeoptimization -XX:+Verbose Why you need `-Xlog:disable` ? ------------- PR: https://git.openjdk.org/jdk/pull/10645 From dcubed at openjdk.org Tue Oct 18 17:11:26 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 18 Oct 2022 17:11:26 GMT Subject: RFR: 8284614: on macOS "spindump" should be run from failure_handler as root In-Reply-To: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> References: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> Message-ID: On Mon, 17 Oct 2022 16:35:16 GMT, Leonid Mesnik wrote: > The fix is contributed by @plummercj actually. Marked as reviewed by dcubed (Reviewer). Thumbs up. ------------- PR: https://git.openjdk.org/jdk/pull/10730Marked as reviewed by dcubed (Reviewer). From dcubed at openjdk.org Tue Oct 18 17:17:01 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 18 Oct 2022 17:17:01 GMT Subject: RFR: 8284614: on macOS "spindump" should be run from failure_handler as root In-Reply-To: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> References: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> Message-ID: <1CzIJdtbmWkh50D7AxJwIiOOmB-1TvdYtLKj2S4TY7s=.4cd77feb-ccdd-4dd4-bf16-b0c8efb625dc@github.com> On Mon, 17 Oct 2022 16:35:16 GMT, Leonid Mesnik wrote: > The fix is contributed by @plummercj actually. For folks that run testing on their personal Mac machines will want to update: $ cat /etc/sudoers.d/sudoers ALL=(ALL) NOPASSWD: /sbin/dmesg, /usr/sbin/spindump Where "" is replaced by your local username, e.g. "dcubed". @tbell29552 - There may be a need to update Ansible playbooks for the macOS machines in Mach5. ------------- PR: https://git.openjdk.org/jdk/pull/10730 From darcy at openjdk.org Tue Oct 18 17:24:02 2022 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 18 Oct 2022 17:24:02 GMT Subject: RFR: 8295470: Update openjdk.java.net => openjdk.org URLs in test code In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 11:55:06 GMT, Magnus Ihse Bursie wrote: > This is a continuation of the effort to update all our URLs to the new top-level domain. > > This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. > > I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. Marked as reviewed by darcy (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10744 From lmesnik at openjdk.org Tue Oct 18 17:37:32 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 18 Oct 2022 17:37:32 GMT Subject: Integrated: 8284614: on macOS "spindump" should be run from failure_handler as root In-Reply-To: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> References: <3DgeklO8yKV5sBiixQbD5piICBaLXz2Zy6GqJrb06Vc=.e78f565e-77c9-47ba-99c0-0a0bc8020b6b@github.com> Message-ID: On Mon, 17 Oct 2022 16:35:16 GMT, Leonid Mesnik wrote: > The fix is contributed by @plummercj actually. This pull request has now been integrated. Changeset: 0233ba76 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/0233ba763d84e6da8ec03df5d021a13c5fbbc871 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8284614: on macOS "spindump" should be run from failure_handler as root Co-authored-by: Chris Plummer Reviewed-by: dnsimon, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/10730 From dlong at openjdk.org Tue Oct 18 17:47:09 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 18 Oct 2022 17:47:09 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() Message-ID: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. ------------- Commit messages: - check callee->is_unloading in fixup_callers_callsite Changes: https://git.openjdk.org/jdk/pull/10747/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10747&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294538 Stats: 22 lines in 3 files changed: 9 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10747.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10747/head:pull/10747 PR: https://git.openjdk.org/jdk/pull/10747 From vlivanov at openjdk.org Tue Oct 18 17:51:30 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 18 Oct 2022 17:51:30 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Some additional info to consider: the library modifies only MXCSR register and `-XX:+RestoreMXCSROnJNICalls` makes the test to pass. $ objdump -D libfast-math.so ... 0000000000000560 : 560: f3 0f 1e fa endbr64 564: 0f ae 5c 24 fc stmxcsr -0x4(%rsp) 569: 81 4c 24 fc 40 80 00 orl $0x8040,-0x4(%rsp) 570: 00 571: 0f ae 54 24 fc ldmxcsr -0x4(%rsp) 576: c3 ret ------------- PR: https://git.openjdk.org/jdk/pull/10661 From vlivanov at openjdk.org Tue Oct 18 18:02:10 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 18 Oct 2022 18:02:10 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic On non-windows x86-64 the JVM saves/restores (and adjusts if needed) MXCSR inside call stub as mandated by the ABI: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L262-L273 x87 is not used in x86-64-specific code anymore, so x87-specific part of FP environment (FPSR and FPCS registers) is not relevant to JVM anymore there. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From coleenp at openjdk.org Tue Oct 18 18:10:26 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Oct 2022 18:10:26 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: <_-idExciGXFGQOsMFh2UvXQywTJr9UWxnsEGpCQ8UwM=.fec4784d-3494-486b-ab6f-8c06cdfd87a9@github.com> On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix Shenandoah I like this change a lot. If we change the name of AnyObj to something AnyAllocationBase, we should change the rest to ResourceAllocationBase, HeapAllocationBase etc. We should do this in a separate PR. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10745 From vlivanov at openjdk.org Tue Oct 18 18:34:27 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 18 Oct 2022 18:34:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <3_cE2VNWa47otWcHuc-OG5CxNWp_O_CSswd1pcFlJ9I=.e17801eb-ba0d-41bb-b405-27a405295aec@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic So, IMO the discussion boils down to how we want a misbehaving native library to be handled by the JVM. The ABI lists MXCSR as a callee-saved register, so there's nothing wrong on JVM side from that perspective. >From a quality of implementation perspective though, JVM could do a better job at catching broken libraries. Of course, there are numerous ways for a native code to break the JVM, but in this particular case, it looks trivial to catch the problem. The question is how much overhead we can afford to introduce for that. Whether it should be an opt-in solution (e.g., `-Xcheck:jni` or `-XX:+AlwaysRestoreFPU`/`-XX:+RestoreMXCSROnJNICalls`), opt-out (unconditionally recover or report an error when FP env is corrupted, optionally providing a way to turn it off), or apply a band-aid fix just to fix the immediate problem with GCC's fast-math mode. I'd like to dissuade from going with just a band-aid fix (we already went through that multiple times with different level of success) and try to improve the overall experience JVM provides. It feels like just pushing the problem further away and it would be very unfortunate to repeat the very same exercise in the future. My preferred solution would be to automatically detect the corruption and restore MXCSR register across a JNI call, but if it turns out to be too expensive, JVM could check for MXCSR register corruption after every JNI call and crash issuing a message with diagnostic details about where corruption happened (info about library and entry) offering to turn on `-XX:+AlwaysRestoreFPU`/`-XX:+RestoreMXCSROnJNICalls` as a stop-the-gap solution. It would send users a clear signal there's something wrong with their code/environment, but still giving them an option to workaround the problem while fixing the issue. Saying that, I'd like to stress that I'm perfectly fine with addressing the general issue of misbehaving native libraries separately (if we agree it's worth it) and I trust @dholmes-ora and @theRealAph to choose the most appropriate fix for this particular bug. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From kvn at openjdk.org Tue Oct 18 18:46:10 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Oct 2022 18:46:10 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Seems fine. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10747 From aturbanov at openjdk.org Tue Oct 18 20:28:08 2022 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Tue, 18 Oct 2022 20:28:08 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v2] In-Reply-To: References: Message-ID: On Thu, 13 Oct 2022 10:12:42 GMT, Bhavana Kilambi wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Modified JTREG test to include feature constraints test/hotspot/jtreg/compiler/vectorization/TestEor3AArch64.java line 44: > 42: public class TestEor3AArch64 { > 43: > 44: private final static int LENGTH = 2048; Suggestion: private static final int LENGTH = 2048; test/hotspot/jtreg/compiler/vectorization/TestEor3AArch64.java line 45: > 43: > 44: private final static int LENGTH = 2048; > 45: private final static Random RD = Utils.getRandomInstance(); let's use blessed modifiers order Suggestion: private static final Random RD = Utils.getRandomInstance(); ------------- PR: https://git.openjdk.org/jdk/pull/10407 From coleenp at openjdk.org Tue Oct 18 20:41:11 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Oct 2022 20:41:11 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v3] In-Reply-To: <8iNwjxGcRg1uK4X6s7Bk0Sv4qWbYiE_2-Avo3X65xc4=.5583d1df-b49a-419a-aee6-f3f6fffb5b63@github.com> References: <8iNwjxGcRg1uK4X6s7Bk0Sv4qWbYiE_2-Avo3X65xc4=.5583d1df-b49a-419a-aee6-f3f6fffb5b63@github.com> Message-ID: On Tue, 18 Oct 2022 06:51:28 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/classPrelinker.cpp line 146: >> >>> 144: >>> 145: bool first_time; >>> 146: _processed_classes.put_if_absent(ik, &first_time); >> >> I don't see any get functions for this hashtable? > > This table is used to check if we have already worked on the class: > > > bool first_time; > _processed_classes.put_if_absent(ik, &first_time); > if (!first_time) { > return; > } Oh, ok I missed that. Can you add a comment there about what you're doing and why? ------------- PR: https://git.openjdk.org/jdk/pull/10330 From coleenp at openjdk.org Tue Oct 18 20:51:03 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Oct 2022 20:51:03 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v3] In-Reply-To: <8iNwjxGcRg1uK4X6s7Bk0Sv4qWbYiE_2-Avo3X65xc4=.5583d1df-b49a-419a-aee6-f3f6fffb5b63@github.com> References: <8iNwjxGcRg1uK4X6s7Bk0Sv4qWbYiE_2-Avo3X65xc4=.5583d1df-b49a-419a-aee6-f3f6fffb5b63@github.com> Message-ID: On Tue, 18 Oct 2022 06:51:15 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/classPrelinker.cpp line 59: >> >>> 57: ClassPrelinker::ClassPrelinker() { >>> 58: assert(_singleton == NULL, "must be"); >>> 59: _singleton = this; >> >> I'm not sure why you have this? > > I was trying to put the states of the ClassPrelinker in its instance fields, but this is probably confusing. I'll change ClassPrelinker to AllStatic instead. It doesn't look like you did this. >> src/hotspot/share/cds/classPrelinker.cpp line 205: >> >>> 203: >>> 204: #if INCLUDE_CDS_JAVA_HEAP >>> 205: void ClassPrelinker::resolve_string(constantPoolHandle cp, int cp_index, TRAPS) { >> >> Is this function only needed in this cpp file? Can you make it static and defined before its caller? Are there other functions where this is true? Does it need to be a member function of ClassPrelinker? > > I prefer to make such methods private, so it can access other private members of the same class. but this doesn't reference other private members of the class. The nice thing about it being a static non-member function is that you can easily see this without looking anywhere else. Also, it can be defined above the caller, which is where it's easy to find. It's something that varies with developers but I like the style where you read a new file and see little function, little function, etc, leading up to the big function that calls them all. ------------- PR: https://git.openjdk.org/jdk/pull/10330 From coleenp at openjdk.org Tue Oct 18 20:51:06 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 18 Oct 2022 20:51:06 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v3] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 15:39:08 GMT, Ioi Lam wrote: >> Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: >> >> - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. >> - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. >> >> By doing the resolution at dump time, we can speed up run time start-up by a little bit. >> >> The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed product build src/hotspot/share/cds/classPrelinker.hpp line 51: > 49: // if all of its supertypes are loaded from the CDS archive. > 50: class ClassPrelinker : public StackObj { > 51: typedef ResourceHashtable ClassesTable; You can use the new shiny using ClassesTable = ResourceHashtable instead. src/hotspot/share/oops/constantPool.cpp line 404: > 402: } > 403: > 404: bool ConstantPool::maybe_archive_resolved_klass_at(int cp_index) { yes, I think this is better here. ------------- PR: https://git.openjdk.org/jdk/pull/10330 From sviswanathan at openjdk.org Tue Oct 18 23:25:10 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 18 Oct 2022 23:25:10 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: <523ASDMlZe7mAZaBQe3ipxBLaLum7_XZqLLUUgsCJi0=.db28f521-c957-4fb2-8dcc-7c09d46189e3@github.com> On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 262: > 260: private static void processMultipleBlocks(byte[] input, int offset, int length, byte[] aBytes, byte[] rBytes) { > 261: MutableIntegerModuloP A = ipl1305.getElement(aBytes).mutable(); > 262: MutableIntegerModuloP R = ipl1305.getElement(rBytes).mutable(); R doesn't need to be mutable. src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 286: > 284: * numeric values. > 285: */ > 286: private void setRSVals() { //throws InvalidKeyException { The R and S check for invalid key (all bytes zero) could be submitted as a separate PR. It is not related to the Poly1305 acceleration. test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305IntrinsicFuzzTest.java line 39: > 37: public static void main(String[] args) throws Exception { > 38: //Note: it might be useful to increase this number during development of new Poly1305 intrinsics > 39: final int repeat = 100; Should we increase this repeat count for the c2 compiler to kick in for compiling engineUpdate() and have the call to stub in place from there? test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305KAT.java line 133: > 131: System.out.println("*** Test " + ++testNumber + ": " + > 132: test.testName); > 133: if (runSingleTest(test)) { runSingleTest may need to be called enough number of times for the engineUpdate to be compiled by c2. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From coleenp at openjdk.org Wed Oct 19 00:09:50 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 00:09:50 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Tue, 18 Oct 2022 14:40:15 GMT, Thomas Stuefe wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Shenandoah > > I like this in principle, I always found the "and also C-heap" usage of ResourceObj odd. But deciding allocation area via inheritance feels a bit restrictive. Since I then cannot have instances of a class live in different areas, I need to decide at class design time where it will live. Which also has implications for allowed lifetime (e.g. no RA before VM initialization). > > In practice, it never was a big deal, since everything can live on the stack or via composition in other objects, and ResourceObj can live on the C-heap too. But with the proposed patch ResourceObj would only live on RA, and cannot live on the C-heap nor in hand-created Arenas. > > I admit I have no good solution either. We could make allocation more flexible by accumulating all operator new variants in just a single base class, AnyObj. The problem is that we'd need to track the allocation type at runtime. So we'd trade performance for flexibility. @tstuefe Your sentiment is also represented internally. On the other side, I like having the convenience of having a default allocation for certain classes, or most classes even, but then there are outliers like these AnyObj classes. Designing a better allocation scheme for these rather than inheriting them through their base classes is something we've been talking about. Maybe we should have an RFC email so you can reply. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From stuefe at openjdk.org Wed Oct 19 05:35:04 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Oct 2022 05:35:04 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Tue, 18 Oct 2022 14:40:15 GMT, Thomas Stuefe wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Shenandoah > > I like this in principle, I always found the "and also C-heap" usage of ResourceObj odd. But deciding allocation area via inheritance feels a bit restrictive. Since I then cannot have instances of a class live in different areas, I need to decide at class design time where it will live. Which also has implications for allowed lifetime (e.g. no RA before VM initialization). > > In practice, it never was a big deal, since everything can live on the stack or via composition in other objects, and ResourceObj can live on the C-heap too. But with the proposed patch ResourceObj would only live on RA, and cannot live on the C-heap nor in hand-created Arenas. > > I admit I have no good solution either. We could make allocation more flexible by accumulating all operator new variants in just a single base class, AnyObj. The problem is that we'd need to track the allocation type at runtime. So we'd trade performance for flexibility. > @tstuefe Your sentiment is also represented internally. On the other side, I like having the convenience of having a default allocation for certain classes, or most classes even, but then there are outliers like these AnyObj classes. Designing a better allocation scheme for these rather than inheriting them through their base classes is something we've been talking about. Maybe we should have an RFC email so you can reply. Thinking about this some more, I always could just wrap a class in a suitable holder if the class allocation scheme does not fit. The holder could even be a utility class, e.g.: allocation.hpp: template struct CHeapHolder: public CHeapObj { T v; }; x.cpp: class X: public ResourceObj { ... }; typedef CHeapHolder XInCHeap; Since that is easy, and considering that most of our objects have a clear 1:1 relation to the area they live in, maybe we should not bother carrying the type info and figuring out deletion at runtime. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From stuefe at openjdk.org Wed Oct 19 06:41:44 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Oct 2022 06:41:44 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix Shenandoah So, `AnyObj` is the holdover from the old ResourceObj? Is the intent to move most objects to a clear allocation type in followup RFEs? I'm not sure that we need `AnyObj` at all. Or, we could get rid of it in the future. The purpose of `AnyObj` is to place an object into a different area than its class designer intended. That can be done simply via holder objects, see my earlier comment. It would remove the runtime cost of tracking the allocation type per allocation. It would be a bit less convenient since you have to go thru the holder when accessing the object. But we don't want to have many classes like these anyway. I think that many cases where we today have a ResourceObj allocated in C-Heap can actually be made CHeapObj. I'm not sure how many real cases there are where we mix C-heap and RA for the same class. ---- If we go with `AnyObj`, I have a smaller concern: `AnyObj` sounds like the typical absolute base class many frameworks have. But here, it has a specific role, and we probably want to discourage its use for new classes. It should only be used where its really needed. It does not have to be super convenient, and maybe should be renamed to something like `MultipleAllocationObj` or similar. Then, allocation area for `AnyObj` is determined by the overloaded new I use: - `new X;` // RA - `new (mtTest) X;` // C-Heap - `new (arena) X;` // lives in Arena but this can be confusing, especially for newcomers. The default of RA is surprising, in `ResourceObj` it was right in the name. Since RA allocation is a bit dangerous, I think RA as default can trip over people. I also dislike the MEMFLAGS==CHeap association. MEMFLAGS is semantically different from where the object lives and this may bite us later. For example, today we don't require MEMFLAG for RA allocation because the arena is tagged with a single flag. But we could easily switch to per-allocation tracking instead by allowing an arena to carry multiple MEMFLAGS. It would have some advantages. Therefore I would require the user to hand in the allocation type when calling new AnyObj. It would be clearer, and since we don't want to have many of these classes anyway, I think that would be okay. src/hotspot/share/asm/codeBuffer.hpp line 386: > 384: // CodeBuffers must be allocated on the stack except for a single > 385: // special case during expansion which is handled internally. This > 386: // is done to guarantee proper cleanup of resources. Not your patch, but comment is misleading. Probably should say "on the ResourceArea". ------------- PR: https://git.openjdk.org/jdk/pull/10745 From jnordstrom at openjdk.org Wed Oct 19 06:54:08 2022 From: jnordstrom at openjdk.org (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Wed, 19 Oct 2022 06:54:08 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events I'm realizing that the monitor class isn't available elsewhere; it is in objectMonitor.cpp, and passed to private members in the `JavaObjectMonitorEvent`, which in turn is generated. The only possibility I see to moving the logic away from objectMonitor.cpp is to change how the event's classes are generated. Either by adding getter to certain members (in this case something like `get_monitorClass `to `JavaObjectMonitorEvent`), or trying to shoehorn the `is_excluded` method into the generated `JavaObjectMonitorEvent`. Neither of these approaches would I say are even worth considering, and they'd also still introduce overhead. ------------- PR: https://git.openjdk.org/jdk/pull/8883 From dlong at openjdk.org Wed Oct 19 07:02:57 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Oct 2022 07:02:57 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Thanks Vladimir. ------------- PR: https://git.openjdk.org/jdk/pull/10747 From jbhateja at openjdk.org Wed Oct 19 07:46:01 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Oct 2022 07:46:01 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector [v2] In-Reply-To: References: Message-ID: <2QQplnsxNj7THlLJteQ3EfmBcqNnSqH4he4vso9PjLk=.2be4a7e8-3165-479b-8dce-57a312e54ae1@github.com> On Tue, 18 Oct 2022 01:44:21 GMT, Xiaohong Gong wrote: >> "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. >> >> This patch adds the vector intrinsic implementation of it. The steps are: >> >> 1) Load the const "iota" vector. >> >> We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. >> >> 2) Compute indexes with "`vec + iota * scale`" >> >> Here is the performance result to the new added micro benchmark on ARM NEON: >> >> Benchmark Gain >> IndexVectorBenchmark.byteIndexVector 1.477 >> IndexVectorBenchmark.doubleIndexVector 5.031 >> IndexVectorBenchmark.floatIndexVector 5.342 >> IndexVectorBenchmark.intIndexVector 5.529 >> IndexVectorBenchmark.longIndexVector 3.177 >> IndexVectorBenchmark.shortIndexVector 5.841 >> >> >> Please help to review and share the feedback! Thanks in advance! > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Add the floating point support for VectorLoadConst and remove the VectorCast > - Merge branch 'master' into JDK-8293409 > - 8293409: [vectorapi] Intrinsify VectorSupport.indexVector Hi @XiaohongGong , patch now shows significant gains on both AVX512 and legacy X86 targets. X86 and common IR changes LGTM, thanks! ------------- Marked as reviewed by jbhateja (Reviewer). PR: https://git.openjdk.org/jdk/pull/10332 From xgong at openjdk.org Wed Oct 19 07:50:13 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 19 Oct 2022 07:50:13 GMT Subject: RFR: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector [v2] In-Reply-To: <2QQplnsxNj7THlLJteQ3EfmBcqNnSqH4he4vso9PjLk=.2be4a7e8-3165-479b-8dce-57a312e54ae1@github.com> References: <2QQplnsxNj7THlLJteQ3EfmBcqNnSqH4he4vso9PjLk=.2be4a7e8-3165-479b-8dce-57a312e54ae1@github.com> Message-ID: <5DXiOa_G0UGgdQwT3hY3SJtrV48jKRtjtXdjO7nLNL8=.d9497a92-ae5d-43da-bf39-1c53106e74f1@github.com> On Wed, 19 Oct 2022 07:43:33 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Add the floating point support for VectorLoadConst and remove the VectorCast >> - Merge branch 'master' into JDK-8293409 >> - 8293409: [vectorapi] Intrinsify VectorSupport.indexVector > > Hi @XiaohongGong , patch now shows significant gains on both AVX512 and legacy X86 targets. > > X86 and common IR changes LGTM, thanks! Thanks for the review @jatin-bhateja @theRealELiu ! ------------- PR: https://git.openjdk.org/jdk/pull/10332 From dholmes at openjdk.org Wed Oct 19 07:57:12 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Oct 2022 07:57:12 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: References: Message-ID: On Fri, 14 Oct 2022 08:52:11 GMT, Joakim Nordstr?m wrote: >> Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. >> >> # Testing >> - jdk_jfr > > Joakim Nordstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Changed name to CHUNK_ROTATION_MONITOR and made some other rearrangements > - Merge branch 'master' into JDK-8286707-jfr-dont-commit-jfr-internal-jdk-javamonitorwait-events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events > - 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events Thank you for making the effort to investigate alternatives, it is appreciated. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/8883 From xlinzheng at openjdk.org Wed Oct 19 08:29:27 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 19 Oct 2022 08:29:27 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser Message-ID: RISC-V generates debuginfo like > readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so ... Length: 1756 Version: 2 Offset into .debug_info: 0x4bc5e9 Pointer Size: 8 Segment Size: 0 Address Length 0000000000344ece 0000000000004a2c 0000000000000000 0000000000000000 <= 0000000000000000 0000000000000000 <= 0000000000000000 0000000000000000 <= 00000000003498fa 0000000000000016 0000000000349910 0000000000000016 .... 000000000026d5b8 0000000000000b9a 000000000034a532 0000000000000628 000000000034ab5a 00000000000002ac 0000000000000000 0000000000000000 <= 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000000000034ae06 0000000000000bee 000000000034b9f4 0000000000000660 000000000034c054 00000000000005aa 0000000000000000 0000000000000000 0000000000000000 0000000000000000 <= 000000000034c5fe 0000000000000af2 000000000034d0f0 0000000000000f16 000000000034e006 0000000000000b4a 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000000000026e152 000000000000000e 0000000000000000 0000000000000000 Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo so that the result would not look correctly with tests fail. The `_header._unit_length` is read but not used and it is the real length which can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64 and riscv64. Running a tier1 sanity test now. Thanks, Xiaolin [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7499 ------------- Commit messages: - Fix for dwarf parser Changes: https://git.openjdk.org/jdk/pull/10758/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10758&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295646 Stats: 11 lines in 2 files changed: 6 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10758.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10758/head:pull/10758 PR: https://git.openjdk.org/jdk/pull/10758 From shade at openjdk.org Wed Oct 19 08:43:24 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Oct 2022 08:43:24 GMT Subject: Integrated: 8294211: Zero: Decode arch-specific error context if possible In-Reply-To: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> References: <68x5tTa2dHOc-tAj6OfoLSK2MNK2GN2io8qj1POUT2I=.b4ef9779-3b2a-4af4-aa1b-166375420547@github.com> Message-ID: On Thu, 22 Sep 2022 18:20:28 GMT, Aleksey Shipilev wrote: > After POSIX signal refactorings, Zero error handling had "regressed" a bit: Zero always gets `NULL` as `pc` in error handling code, and thus it fails with SEGV at pc=0x0. We can do better by implementing context decoding where possible. > > Unfortunately, this introduces some arch-specific code in Zero code. The arch-specific code is copy-pasted (with inline definitions, if needed) from the relevant `os_linux_*.cpp` files. The unimplemented arches would still report the same confusing `hs_err`-s. We can emulate (and thus test) the generic behavior using new diagnostic VM option. > > This reverts parts of [JDK-8259392](https://bugs.openjdk.org/browse/JDK-8259392). > > Sample test: > > > import java.lang.reflect.*; > import sun.misc.Unsafe; > > public class Crash { > public static void main(String... args) throws Exception { > Field f = Unsafe.class.getDeclaredField("theUnsafe"); > f.setAccessible(true); > Unsafe u = (Unsafe) f.get(null); > u.getInt(42); // accesing via broken ptr > } > } > > > Linux x86_64 Zero fastdebug crash currently: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000000000000, pid=538793, tid=538794 > # > ... > # (no native frame info) > ... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Linux x86_64 Zero fastdebug crash with this patch: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007fbbbf08b584, pid=520119, tid=520120 > # > ... > # Problematic frame: > # V [libjvm.so+0xcbe584] Unsafe_GetInt+0xe4 > .... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Linux x86_64 Zero fastdebug crash with this patch and `-XX:-DecodeErrorContext`: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x0000000000000000, pid=520268, tid=520269 > # > ... > # Problematic frame: > # C 0x0000000000000000 > ... > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002a > > > Additional testing: > - [x] Linux x86_64 Zero fastdebug eyeballing crash logs > - [x] Linux x86_64 Zero fastdebug, `tier1` > - [x] Linux {x86_64, x86_32, aarch64, arm, riscv64, s390x, ppc64le, ppc64be} Zero fastdebug builds This pull request has now been integrated. Changeset: 3f3d63d0 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/3f3d63d02ada66d5739e690d786684d25dc59004 Stats: 158 lines in 4 files changed: 132 ins; 11 del; 15 mod 8294211: Zero: Decode arch-specific error context if possible Reviewed-by: stuefe, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/10397 From jsjolen at openjdk.org Wed Oct 19 09:00:14 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Oct 2022 09:00:14 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v3] In-Reply-To: References: Message-ID: > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary -Xlog:disable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10645/files - new: https://git.openjdk.org/jdk/pull/10645/files/46429fb0..5b9f53d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10645.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10645/head:pull/10645 PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Wed Oct 19 09:00:15 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Oct 2022 09:00:15 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 17:06:52 GMT, Vladimir Kozlov wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix review comments > > test/hotspot/jtreg/compiler/uncommontrap/TestDeoptOOM.java line 45: > >> 43: * -XX:CompileCommand=exclude,compiler.uncommontrap.TestDeoptOOM::m9_1 >> 44: * -XX:+UnlockDiagnosticVMOptions >> 45: * -XX:+UseZGC -XX:+LogCompilation -Xlog:disable -Xlog:deoptimization=trace -XX:+TraceDeoptimization -XX:+Verbose > > Why you need `-Xlog:disable` ? Short version: Mistake on my part, it shouldn't be there. Long version: `-Xlog:disable` disables the default output to stdout, this is important when you want your logs to *not* go to stdout at all. Of course, we want our logs to go to stdout, so this is unnecessary here. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From xgong at openjdk.org Wed Oct 19 09:28:04 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 19 Oct 2022 09:28:04 GMT Subject: Integrated: 8293409: [vectorapi] Intrinsify VectorSupport.indexVector In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong wrote: > "`VectorSupport.indexVector()`" is used to compute a vector that contains the index values based on a given vector and a scale value (`i.e. index = vec + iota * scale`). This function is widely used in other APIs like "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral types. For floating point vectors, it needs an additional vector cast to get the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! This pull request has now been integrated. Changeset: 857b0f9b Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/857b0f9b05bc711f3282a0da85fcff131fffab91 Stats: 391 lines in 14 files changed: 361 ins; 9 del; 21 mod 8293409: [vectorapi] Intrinsify VectorSupport.indexVector Reviewed-by: eliu, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/10332 From jnordstrom at openjdk.org Wed Oct 19 10:13:03 2022 From: jnordstrom at openjdk.org (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Wed, 19 Oct 2022 10:13:03 GMT Subject: RFR: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events [v2] In-Reply-To: <1XyiaRUBadonxQ_XjryHn5duKVmXy931G9QleAH0iDA=.da2ccc52-50ab-491e-bc1e-7e5c133f6fcf@github.com> References: <1XyiaRUBadonxQ_XjryHn5duKVmXy931G9QleAH0iDA=.da2ccc52-50ab-491e-bc1e-7e5c133f6fcf@github.com> Message-ID: On Tue, 18 Oct 2022 00:50:42 GMT, David Holmes wrote: >> There exist a general exclusion/inclusion mechanism already. But it is an all-or-nothing proposition. This particular case is a thread that we can't exclude because it runs the periodic events, upon being notified. It is the notification mechanism to run the periodic events that trigger this large amount of unnecessary MonitorWait events. Even should we change it to some util.concurrent construct, we are only pushing the problem, because we might be instrumenting them later too. To work with the existing exclusion mechanism, the system would have to introduce an additional thread, which will be excluded, which only handles the notification, and then by some other means triggers another periodic thread (included) to run the periodic events. > > @mgronlun my request is that this filtering be done inside the commit logic by the JFR code, not at the site where the event is generated - ie this internal-jfr-event filtering is internalized into the JFR code. Thank you for reviewing and comments, @dholmes-ora, @egahlin and @mgronlun! Would anyone of you sponsor this as well? Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/8883 From jsjolen at openjdk.org Wed Oct 19 10:21:09 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Oct 2022 10:21:09 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 17:07:48 GMT, Vladimir Kozlov wrote: > Nice work. Can you show an example of new output vs old? New: `-Xlog:deoptimization=trace:deopt.txt` [1,923s][debug][deoptimization] Expressions size: 1 [1,923s][debug][deoptimization] - Reconstructed expression 0 (OBJECT): NULL [1,923s][debug][deoptimization] Locals size: 3 [1,923s][debug][deoptimization] - Reconstructed local 0 (OBJECT): java/lang/String [1,923s][debug][deoptimization] - Reconstructed local 1 (OBJECT): NULL [1,924s][debug][deoptimization] [1. Interpreted Frame] [1,924s][debug][deoptimization] Interpreted frame (sp=0x00007fd07e205548 unextended sp=0x00007fd07e205548, fp=0x00007fd07e205598, real_fp=0x00007fd07e205598, pc=0x00007fd0704506a0) [1,924s][debug][deoptimization] ~deoptimization entry points [0x00007fd0704506a0, 0x00007fd070453b38] 13464 bytes [1,924s][debug][deoptimization] BufferBlob (0x00007fd070431510) used for Interpreter [1,924s][debug][deoptimization] - local [0x000000011d00d0e8]; #0 [1,924s][debug][deoptimization] - local [0x0000000000000000]; #1 [1,924s][debug][deoptimization] - local [0x0000000000000000]; #2 [1,924s][debug][deoptimization] - stack [0x0000000000000000]; #0 [1,924s][debug][deoptimization] - monitor[0x00007fd07e205550] [1,924s][debug][deoptimization] - bcp [0x00007fd02800b230]; @8 [1,924s][debug][deoptimization] - locals [0x00007fd07e2055b8] [1,924s][debug][deoptimization] - method [0x00007fd02800b298]; virtual jboolean java.lang.String.equals(jobject) [1,924s][debug][deoptimization] Old `-XX:+WizardMode -XX:+Verbose -XX:+PrintDeoptimizationDetails`: Expressions size: 1 - Reconstructed expression 0 (OBJECT): NULL Locals size: 3 - Reconstructed local 0 (OBJECT): java/lang/String - Reconstructed local 1 (OBJECT): NULL [1. Interpreted Frame] Interpreted frame (sp=0x00007fc830059548 unextended sp=0x00007fc830059548, fp=0x00007fc830059598, real_fp=0x00007fc830059598, pc=0x00007fc8204506a0) ~deoptimization entry points [0x00007fc8204506a0, 0x00007fc820453b38] 13464 bytes BufferBlob (0x00007fc820431510) used for Interpreter Trying to load: /home/johan/jdk/build/linux-x64-slowdebug/jdk/lib/server/libhsdis-amd64.so Trying to load: /home/johan/jdk/build/linux-x64-slowdebug/jdk/lib/server/hsdis-amd64.so Trying to load: /home/johan/jdk/build/linux-x64-slowdebug/jdk/lib/hsdis-amd64.so Trying to load: hsdis-amd64.so [1,857s][warning][os] Loading hsdis library failed Could not load hsdis-amd64.so; hsdis-amd64.so: cannot open shared object file: No such file or directory; PrintAssembly defaults to abstract disassembly. [MachCode] // Snip! Huge amount of logs here [/MachCode] ------------- PR: https://git.openjdk.org/jdk/pull/10645 From chagedorn at openjdk.org Wed Oct 19 10:22:27 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Oct 2022 10:22:27 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 08:22:01 GMT, Xiaolin Zheng wrote: > RISC-V generates debuginfo like > > >> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so > > ... > Length: 1756 > Version: 2 > Offset into .debug_info: 0x4bc5e9 > Pointer Size: 8 > Segment Size: 0 > > Address Length > 0000000000344ece 0000000000004a2c > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 00000000003498fa 0000000000000016 > 0000000000349910 0000000000000016 > .... > 000000000026d5b8 0000000000000b9a > 000000000034a532 0000000000000628 > 000000000034ab5a 00000000000002ac > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000034ae06 0000000000000bee > 000000000034b9f4 0000000000000660 > 000000000034c054 00000000000005aa > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 <= > 000000000034c5fe 0000000000000af2 > 000000000034d0f0 0000000000000f16 > 000000000034e006 0000000000000b4a > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000026e152 000000000000000e > 0000000000000000 0000000000000000 > > > Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. > > Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] > > Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. > Running a tier1 sanity test now. > > Thanks, > Xiaolin > > [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa > [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 Marked as reviewed by chagedorn (Reviewer). Hi Xiaolin > Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses address == 0 && size == 0 in is_terminating_entry() to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo so that the result would not look correctly with tests failures. The _header._unit_length is read but not used and it is the real length which can determine the section's end, so we can use it to get the end position of a section instead of address == 0 && size == 0 checks to fix this issue. That's interesting that the emitted format is not compliant with the official DWARF spec. I've encountered such inconsistencies at other places as well where GCC does something differently. Anyways, in that case, your fix makes sense to read the entire set by taking the `_unit_length` field instead of relying on `(0, 0)` being the terminating entry (which would normally be the same result). We could additionally assert that the real terminating entry is indeed `(0, 0)` as specified in the spec. But you need to check if that is the case on RISC-V. Thanks, Christian ------------- PR: https://git.openjdk.org/jdk/pull/10758 From iwalulya at openjdk.org Wed Oct 19 10:24:28 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 19 Oct 2022 10:24:28 GMT Subject: RFR: 8233697: CHT: Iteration parallelization Message-ID: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Hi, Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. Usecase is in parallelizing the merging of large remsets for G1. Testing: tier 1-3 ------------- Commit messages: - 8233697: CHT: Iteration parallelization Changes: https://git.openjdk.org/jdk/pull/10759/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8233697 Stats: 200 lines in 3 files changed: 200 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10759/head:pull/10759 PR: https://git.openjdk.org/jdk/pull/10759 From luhenry at openjdk.org Wed Oct 19 10:28:20 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 19 Oct 2022 10:28:20 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - Explicit use of temp registers - fixup! Add -XX:CacheLineSize= to set cache line size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/de0f1a28..fc7d123e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=04-05 Stats: 65 lines in 5 files changed: 43 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Wed Oct 19 10:28:20 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 19 Oct 2022 10:28:20 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v4] In-Reply-To: <_-asW3wRumLQ-iYayP1ULZH619Y8Upg_gjNjgUMnZAY=.8510e2dc-f782-4f13-9a89-3cc510e7870f@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <24CtJ7csuki9OSb8x1wM-7Rn3-4BOmXfUVok0RkAq08=.77fbd1a6-0654-4853-9142-021b960510c8@github.com> <_-asW3wRumLQ-iYayP1ULZH619Y8Upg_gjNjgUMnZAY=.8510e2dc-f782-4f13-9a89-3cc510e7870f@github.com> Message-ID: <0ki-Np9xRZY67ckDevu2Wf7ukvQ4yaed8Ux8xTsL7kU=.9779c39f-32cb-4bbe-aa2d-bd105fefa3e6@github.com> On Tue, 18 Oct 2022 13:09:27 GMT, Yadong Wang wrote: >> Given it's only made to be called from `StubRoutine::zero_blocks` stub routine and `t0-t1` are temporary registers and `t2` (aka `x7`) is caller-saved, I don't understand why it needs to be made aware for C2? >> >> I'll add them as `tmp0-tmp3` arguments to `MacroAssembler::dcache_zero_blocks` to make sure any future caller of this will be aware. > > C2 generates ClearArrayNodes, which emit zero_words -> zero_blocks directly. I don't find any caller-saving logic there. t0 is free to use (not participating in register allocation), but t1 is used as condition code in C2. base and cnt registers can be colloberred safely because they were indentified as USE_KILL. I've updated it to use `x29`, `x30`, and `x7`, and to explicit `TEMP` them in `riscv.ad`. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From xlinzheng at openjdk.org Wed Oct 19 10:33:55 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 19 Oct 2022 10:33:55 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 10:18:59 GMT, Christian Hagedorn wrote: > We could additionally assert that the real terminating entry is indeed `(0, 0)` as specified in the spec. But you need to check if that is the case on RISC-V. Thanks for your time for reviewing! In fact, before submitting this PR I had one assertion to assert when the new `is_terminating_entry()` returns true, we must encounter a pair of zero. Though I removed it at last because I wanted to fully mirror what Binutils is doing. Of course, I think the assertion is reasonable and on RISC-V it is the same case that the real terminating entry is a pair of 0. So going to add that assertion back. ------------- PR: https://git.openjdk.org/jdk/pull/10758 From jnordstrom at openjdk.org Wed Oct 19 10:37:07 2022 From: jnordstrom at openjdk.org (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Wed, 19 Oct 2022 10:37:07 GMT Subject: Integrated: 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events In-Reply-To: References: Message-ID: On Wed, 25 May 2022 12:24:03 GMT, Joakim Nordstr?m wrote: > Changed the JFR chunk rotation lock object to specific internal class. This allows that specific Object.wait() event to be skipped, thus not adding JFR internal noise to recordings. > > # Testing > - jdk_jfr This pull request has now been integrated. Changeset: fc889577 Author: Joakim Nordstr?m Committer: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/fc889577eaf3f564d896818c1d9b1eb6fa5a8758 Stats: 29 lines in 6 files changed: 20 ins; 3 del; 6 mod 8286707: JFR: Don't commit JFR internal jdk.JavaMonitorWait events Reviewed-by: dholmes, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/8883 From xlinzheng at openjdk.org Wed Oct 19 12:11:30 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 19 Oct 2022 12:11:30 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser [v2] In-Reply-To: References: Message-ID: > RISC-V generates debuginfo like > > >> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so > > ... > Length: 1756 > Version: 2 > Offset into .debug_info: 0x4bc5e9 > Pointer Size: 8 > Segment Size: 0 > > Address Length > 0000000000344ece 0000000000004a2c > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 00000000003498fa 0000000000000016 > 0000000000349910 0000000000000016 > .... > 000000000026d5b8 0000000000000b9a > 000000000034a532 0000000000000628 > 000000000034ab5a 00000000000002ac > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000034ae06 0000000000000bee > 000000000034b9f4 0000000000000660 > 000000000034c054 00000000000005aa > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 <= > 000000000034c5fe 0000000000000af2 > 000000000034d0f0 0000000000000f16 > 000000000034e006 0000000000000b4a > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000026e152 000000000000000e > 0000000000000000 0000000000000000 > > > Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. > > Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] > > Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. > Running a tier1 sanity test now. > > Thanks, > Xiaolin > > [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa > [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Add the assertion back ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10758/files - new: https://git.openjdk.org/jdk/pull/10758/files/c80408ed..76904650 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10758&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10758&range=00-01 Stats: 9 lines in 2 files changed: 5 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10758.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10758/head:pull/10758 PR: https://git.openjdk.org/jdk/pull/10758 From xlinzheng at openjdk.org Wed Oct 19 12:11:30 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 19 Oct 2022 12:11:30 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 08:22:01 GMT, Xiaolin Zheng wrote: > RISC-V generates debuginfo like > > >> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so > > ... > Length: 1756 > Version: 2 > Offset into .debug_info: 0x4bc5e9 > Pointer Size: 8 > Segment Size: 0 > > Address Length > 0000000000344ece 0000000000004a2c > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 <= > 00000000003498fa 0000000000000016 > 0000000000349910 0000000000000016 > .... > 000000000026d5b8 0000000000000b9a > 000000000034a532 0000000000000628 > 000000000034ab5a 00000000000002ac > 0000000000000000 0000000000000000 <= > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000034ae06 0000000000000bee > 000000000034b9f4 0000000000000660 > 000000000034c054 00000000000005aa > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 <= > 000000000034c5fe 0000000000000af2 > 000000000034d0f0 0000000000000f16 > 000000000034e006 0000000000000b4a > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000026e152 000000000000000e > 0000000000000000 0000000000000000 > > > Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. > > Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] > > Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. > Running a tier1 sanity test now. > > Thanks, > Xiaolin > > [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa > [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 Related tests also pass on the three platforms. ------------- PR: https://git.openjdk.org/jdk/pull/10758 From chagedorn at openjdk.org Wed Oct 19 12:46:04 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Oct 2022 12:46:04 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser [v2] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 12:11:30 GMT, Xiaolin Zheng wrote: >> RISC-V generates debuginfo like >> >> >>> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so >> >> ... >> Length: 1756 >> Version: 2 >> Offset into .debug_info: 0x4bc5e9 >> Pointer Size: 8 >> Segment Size: 0 >> >> Address Length >> 0000000000344ece 0000000000004a2c >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 <= >> 00000000003498fa 0000000000000016 >> 0000000000349910 0000000000000016 >> .... >> 000000000026d5b8 0000000000000b9a >> 000000000034a532 0000000000000628 >> 000000000034ab5a 00000000000002ac >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> 000000000034ae06 0000000000000bee >> 000000000034b9f4 0000000000000660 >> 000000000034c054 00000000000005aa >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 <= >> 000000000034c5fe 0000000000000af2 >> 000000000034d0f0 0000000000000f16 >> 000000000034e006 0000000000000b4a >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> 000000000026e152 000000000000000e >> 0000000000000000 0000000000000000 >> >> >> Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. >> >> Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] >> >> Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. >> Running a tier1 sanity test now. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa >> [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Add the assertion back > > We could additionally assert that the real terminating entry is indeed `(0, 0)` as specified in the spec. But you need to check if that is the case on RISC-V. > > Thanks for your time for reviewing! In fact, before submitting this PR I had one assertion to assert when the new `is_terminating_entry()` returns true, we must encounter a pair of zero. Though I removed it at last because I wanted to fully mirror what Binutils is doing. Of course, I think the assertion is reasonable and on RISC-V it is the same case that the real terminating entry is a pair of 0. So going to add that assertion back. Yeah, I think so, too - thanks for adding it back! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10758 From alanb at openjdk.org Wed Oct 19 13:21:58 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Oct 2022 13:21:58 GMT Subject: RFR: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing In-Reply-To: References: Message-ID: On Mon, 17 Oct 2022 09:25:57 GMT, Erik Gahlin wrote: > Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. > > The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. > > TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. > > Testing: tier1-3 + test/jdk/jdk/jfr > > Thanks > Erik src/hotspot/share/jfr/dcmd/jfrDcmds.cpp line 69: > 67: } > 68: > 69: static bool invalid_state(outputStream* out, TRAPS) { The changes look okay to me except that "invalid_state" isn't a clear name for function that tries to ensure that the jdk.jfr module is loaded. Up to you, but I think it would be clear if it were renamed load_jfr that returns disabled when disabled or the jdk.jfr module cannot be loaded. ------------- PR: https://git.openjdk.org/jdk/pull/10723 From alanb at openjdk.org Wed Oct 19 13:41:46 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Oct 2022 13:41:46 GMT Subject: RFR: 8295470: Update openjdk.java.net => openjdk.org URLs in test code In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 11:55:06 GMT, Magnus Ihse Bursie wrote: > This is a continuation of the effort to update all our URLs to the new top-level domain. > > This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. > > I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. This updates several tests that are also in Doug Lea's CVS for j.u.concurrent. That should be okay, just need to get them in sync at some point. ------------- PR: https://git.openjdk.org/jdk/pull/10744 From stuefe at openjdk.org Wed Oct 19 13:47:15 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Oct 2022 13:47:15 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v3] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 09:00:14 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary -Xlog:disable src/hotspot/share/runtime/deoptimization.cpp line 299: > 297: deoptimized_objects = deoptimized_objects || relocked; > 298: #ifndef PRODUCT > 299: LogMessage(deoptimization) lm; Drive-by: Does this not already incur costs? Does LogMessage not contain an internal buffer it allocates? ------------- PR: https://git.openjdk.org/jdk/pull/10645 From stuefe at openjdk.org Wed Oct 19 13:49:13 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 19 Oct 2022 13:49:13 GMT Subject: RFR: JDK-8293114: GC should trim the native heap [v3] In-Reply-To: <23KpPM4oPV6F1nz3g5CvIqvuX-ANcsMH4GuVNXjR-Lw=.b8d0fa2d-bb85-4899-8e21-f68ea64b988d@github.com> References: <23KpPM4oPV6F1nz3g5CvIqvuX-ANcsMH4GuVNXjR-Lw=.b8d0fa2d-bb85-4899-8e21-f68ea64b988d@github.com> Message-ID: > This RFE adds the option to auto-trim the Glibc heap as part of the GC cycle. If the VM process suffered high temporary malloc spikes (regardless whether from JVM- or user code), this could recover significant amounts of memory. > > We discussed this a year ago [1], but the item got pushed to the bottom of my work pile, therefore it took longer than I thought. > > ### Motivation > > The Glibc allocator is reluctant to return memory to the OS, much more so than other allocators. Temporary malloc spikes often carry over as permanent RSS increase. > > Note that C-heap retention is difficult to observe. Since it is freed memory, it won't show up in NMT, it is just a part of private RSS. > > Theoretically, retained memory is not lost since it will be reused by future mallocs. Retaining memory is therefore a bet on the future behavior of the app. The allocator bets on the application needing memory in the near future, and to satisfy that need via malloc. > > But an app's malloc load can fluctuate wildly, with temporary spikes and long idle periods. And if the app rolls its own custom allocators atop of mmap, as hotspot does, a lot of that memory cannot be reused even though it counts toward its memory footprint. > > To help, Glibc exports an API to trim the C-heap: `malloc_trim(3)`. With JDK 18 [2], SAP contributed a new jcmd command to *manually* trim the C-heap on Linux. This RFE adds a complementary way to trim automatically. > > #### Is this even a problem? > > Do we have high malloc spikes in the JVM process? We assume that malloc load from hotspot is usually low since hotspot typically clusters allocations into custom areas - metaspace, code heap, arenas. > > But arenas are subject to Glibc mem retention too. I was surprised by that since I assumed 32k arena chunks were too big to be subject of Glibc retention. But I saw in experiments that high arena peaks often cause lasting RSS increase. > > And of course, both hotspot and JDK do a lot of finer-granular mallocs outside of custom allocators. > > But many cases of high memory retention in Glibc I have seen in third-party JNI code. Libraries allocate large buffers via malloc as temporary buffers. In fact, since we introduced the jcmd "System.trim_native_heap", some of our customers started to call this command periodically in scripts to counter these issues. > > Therefore I think while high malloc spikes are atypical for a JVM process, they can happen. Having a way to auto-trim the native heap makes sense. > > ### When should we trim? > > We want to trim when we know there is a lull in malloc activity coming. But we have no knowledge of the future. > > We could build a heuristic based on malloc frequency. But on closer inspection that is difficult. We cannot use NMT, since NMT has no complete picture (only knows hotspot) and is usually disabled in production anyway. The only way to get *all* mallocs would be to use Glibc malloc hooks. We have done so in desperate cases at SAP, but Glibc removed malloc hooks in 2.35. It would be a messy solution anyway; best to avoid it. > > The next best thing is synchronizing with the larger C-heap users in the VM: compiler and GC. But compiler turns out not to be such a problem, since the compiler uses arenas, and arena chunks are buffered in a free pool with a five-second delay. That means compiler activity that happens in bursts, like at VM startup, will just shuffle arena chunks around from/to the arena free pool, never bothering to call malloc or free. > > That leaves the GC, which was also the experts' recommendation in last year's discussion [1]. Most GCs do uncommit, and trimming the native heap fits well into this. And we want to time the trim to not get into the way of a GC. Plus, integrating trims into the GC cycle lets us reuse GC logging and timing, thereby making RSS changes caused by trim-native visible to the analyst. > > > ### How it works: > > Patch adds new options (experimental for now, and shared among all GCs): > > > -XX:+GCTrimNativeHeap > -XX:GCTrimNativeHeapInterval= (defaults to 60) > > > `GCTrimNativeHeap` is off by default. If enabled, it will cause the VM to trim the native heap on full GCs as well as periodically. The period is defined by `GCTrimNativeHeapInterval`. Periodic trimming can be completely switched off with `GCTrimNativeHeapInterval=0`; in that case, we will only trim on full GCs. > > ### Examples: > > This is an artificial test that causes two high malloc spikes with long idle periods. Observe how RSS recovers with trim but stays up without trim. The trim interval was set to 15 seconds for the test, and no GC was invoked here; this is periodic trimming. > > ![alloc-test](http://cr.openjdk.java.net/~stuefe/other/autotrim/rss-all-collectors.png) > > (See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-all.sh) ) > > Spring pet clinic boots up, then idles. Once with, once without trim, with the trim interval at 60 seconds default. Of course, if it were actually doing something instead of idling, trim effects would be smaller. But the point of trimming is to recover memory in idle periods. > > ![petclinic bootup](http://cr.openjdk.java.net/~stuefe/other/autotrim/spring-petclinic-rss-with-and-without-trim.png)) > > (See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-petclinic-boot.sh) ) > > > > ### Implementation > > One problem I faced when implementing this was that trimming was non-interruptable. GCs usually split the uncommit work into smaller portions, which is impossible for `malloc_trim()`. > > So very slow trims could introduce longer GC pauses. I did not want this, therefore I implemented two ways to trim: > 1) GCs can opt to trim asynchronously. In that case, a `NativeTrimmer` thread runs on behalf of the GC and takes care of all trimming. The GC just nudges the `NativeTrimmer` at the end of its GC cycle, but the trim itself runs concurrently. > 2) GCs can do the trim inside their own thread, synchronously. It will have to wait until the trim is done. > > (1) has the advantage of giving us periodic trims even without GC activity (Shenandoah does this out of the box). > > #### Serial > > Serial does the trimming synchronously as part of a full GC, and only then. I did not want to spawn a separate thread for the SerialGC. Therefore Serial is the only GC that does not offer periodic trimming, it just trims on full GC. > > #### Parallel, G1, Z > > All of them do the trimming asynchronously via `NativeTrimmer`. They schedule the native trim at the end of a full collection. They also pause the trimming at the beginning of a cycle to not trim during GCs. > > #### Shenandoah > > Shenandoah does the trimming synchronously in its service thread, similar to how it handles uncommits. Since the service thread already runs concurrently and continuously, it can do periodic trimming; no need to spin a new thread. And this way we can reuse the Shenandoah timing classes. > > ### Patch details > > - adds three new functions to the `os` namespace: > - `os::trim_native_heap()` implementing trim > - `os::can_trim_native_heap()` and `os::should_trim_native_heap()` to return whether platform supports trimming resp. whether the platform considers trimming to be useful. > - replaces implementation of the cmd "System.trim_native_heap" with the new `os::trim_native_heap` > - provides a new wrapper function wrapping the tedious `mallinfo()` vs `mallinfo2()` business: `os::Linux::get_mallinfo()` > - adds a GC-shared utility class, `GCTrimNative`, that takes care of trimming and GC-logging and houses the `NativeTrimmer` thread class. > - adds a regression test > > > ### Tests > > Tested older Glibc (2.31), and newer Glibc (2.35) (`mallinfo()` vs` mallinfo2()`), on Linux x64. > > The rest of the tests will be done by GHA and in our SAP nightlies. > > > ### Remarks > > #### How about other allocators? > > I have seen this retention problem mainly with the Glibc and the AIX libc. Muslc returns memory more eagerly to the OS. I also tested with jemalloc and found it also reclaims more aggressively, therefore I don't think MacOS or BSD are affected that much by retention either. > > #### Trim costs? > > Trim-native is a tradeoff between memory and performance. We pay > - The cost to do the trim depends on how much is trimmed. Time ranges on my machine between < 1ms for no-op trims, to ~800ms for 32GB trims. > - The cost for re-acquiring the memory, should the memory be needed again, is the second cost factor. > > #### Predicting malloc_trim effects? > > `ShenandoahUncommit` avoids uncommits if they are not necessary, thus avoiding work and gc log spamming. I liked that and tried to follow that example. Tried to devise a way to predict the effect trim could have based on allocator info from mallinfo(3). That was quite frustrating since the documentation was confusing and I had to do a lot of experimenting. In the end, I came up with a heuristic to prevent obviously pointless trim attempts; see `os::should_trim_native_heap()`. I am not completely happy with it. > > #### glibc.malloc.trim_threshold? > > glibc has a tunable that looks like it could influence the willingness of Glibc to return memory to the OS, the "trim_threshold". In practice, I could not get it to do anything useful. Regardless of the setting, it never seemed to influence the trimming behavior. Even if it would work, I'm not sure we'd want to use that, since by doing malloc_trim manually we can space out the trims as we see fit, instead of paying the trim price for free(3). > > > - [1] https://mail.openjdk.org/pipermail/hotspot-dev/2021-August/054323.html > - [2] https://bugs.openjdk.org/browse/JDK-8269345 Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Make test more fault tolerant - Merge branch 'master' into JDK-8293114-GC-trim-native - reduce test runtime on slow hardware - make tests more stable on slow hardware - wip - Fixes and Simplifications - some simplifications - trim-native ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10085/files - new: https://git.openjdk.org/jdk/pull/10085/files/35938c69..40b3e362 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10085&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10085&range=01-02 Stats: 136931 lines in 2979 files changed: 72406 ins; 48210 del; 16315 mod Patch: https://git.openjdk.org/jdk/pull/10085.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10085/head:pull/10085 PR: https://git.openjdk.org/jdk/pull/10085 From egahlin at openjdk.org Wed Oct 19 14:26:07 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 19 Oct 2022 14:26:07 GMT Subject: RFR: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 13:18:21 GMT, Alan Bateman wrote: >> Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. >> >> The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. >> >> TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. >> >> Testing: tier1-3 + test/jdk/jdk/jfr >> >> Thanks >> Erik > > src/hotspot/share/jfr/dcmd/jfrDcmds.cpp line 69: > >> 67: } >> 68: >> 69: static bool invalid_state(outputStream* out, TRAPS) { > > The changes look okay to me except that "invalid_state" isn't a clear name for function that tries to ensure that the jdk.jfr module is loaded. Up to you, but I think it would be clear if it were renamed load_jfr that returns disabled when disabled or the jdk.jfr module cannot be loaded. I agree. I have a large refactoring of jfrDcmd.cpp coming up. I will clean up names and logic in that PR. ------------- PR: https://git.openjdk.org/jdk/pull/10723 From bkilambi at openjdk.org Wed Oct 19 14:27:34 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 19 Oct 2022 14:27:34 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v3] In-Reply-To: References: Message-ID: > Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - > > eor a, a, b > eor a, a, c > > can be optimized to single instruction - `eor3 a, b, c` > > This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - > > > Benchmark gain > TestEor3.test1Int 10.87% > TestEor3.test1Long 8.84% > TestEor3.test2Int 21.68% > TestEor3.test2Long 21.04% > > > The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Changed the modifier order preference in JTREG test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10407/files - new: https://git.openjdk.org/jdk/pull/10407/files/6df4f014..449524ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10407&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10407.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10407/head:pull/10407 PR: https://git.openjdk.org/jdk/pull/10407 From bkilambi at openjdk.org Wed Oct 19 14:27:39 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 19 Oct 2022 14:27:39 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 20:24:07 GMT, Andrey Turbanov wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Modified JTREG test to include feature constraints > > test/hotspot/jtreg/compiler/vectorization/TestEor3AArch64.java line 44: > >> 42: public class TestEor3AArch64 { >> 43: >> 44: private final static int LENGTH = 2048; > > Suggestion: > > private static final int LENGTH = 2048; Hello, thank you for your feedback. I have made the suggested changes and uploaded a new patch. Please review .. ------------- PR: https://git.openjdk.org/jdk/pull/10407 From alanb at openjdk.org Wed Oct 19 14:31:59 2022 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 19 Oct 2022 14:31:59 GMT Subject: RFR: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing In-Reply-To: References: Message-ID: <8kq3EtXOjUZ2l0z1Rs1HOo6-ilALnhC3MvMUn2dPVPg=.e47a98a2-6221-402f-bbfc-103ac7646f1d@github.com> On Mon, 17 Oct 2022 09:25:57 GMT, Erik Gahlin wrote: > Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. > > The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. > > TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. > > Testing: tier1-3 + test/jdk/jdk/jfr > > Thanks > Erik Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10723 From coleenp at openjdk.org Wed Oct 19 16:13:50 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 16:13:50 GMT Subject: RFR: 8293939: Move continuation_enter_setup and friends Message-ID: Please review this trivial change. I moved these functions to sharedRuntime_.cpp just before they are used. Tested with tier1 on x86 and aarch64, and locally test/hotspot/jtreg:hotspot_loom. ------------- Commit messages: - 8293939: Move continuation_enter_setup and friends Changes: https://git.openjdk.org/jdk/pull/10770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10770&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293939 Stats: 368 lines in 5 files changed: 155 ins; 202 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10770.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10770/head:pull/10770 PR: https://git.openjdk.org/jdk/pull/10770 From jbhateja at openjdk.org Wed Oct 19 16:30:04 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 19 Oct 2022 16:30:04 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Some initial assembler level comments. src/hotspot/cpu/x86/assembler_x86.cpp line 5484: > 5482: > 5483: void Assembler::evpunpckhqdq(XMMRegister dst, KRegister mask, XMMRegister src1, XMMRegister src2, bool merge, int vector_len) { > 5484: assert(UseAVX > 2, "requires AVX512F"); Please replace flag with feature EVEX check. src/hotspot/cpu/x86/assembler_x86.cpp line 7831: > 7829: > 7830: void Assembler::vpandq(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { > 7831: assert(VM_Version::supports_evex(), ""); Assertion should check existence of AVX512VL for non 512 but vectors. src/hotspot/cpu/x86/assembler_x86.cpp line 7958: > 7956: > 7957: void Assembler::vporq(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { > 7958: assert(VM_Version::supports_evex(), ""); Same as above src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1960: > 1958: address StubGenerator::generate_poly1305_masksCP() { > 1959: StubCodeMark mark(this, "StubRoutines", "generate_poly1305_masksCP"); > 1960: address start = __ pc(); You may use [align64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L777) here, like ------------- PR: https://git.openjdk.org/jdk/pull/10582 From coleenp at openjdk.org Wed Oct 19 18:05:24 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 18:05:24 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v3] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 09:00:14 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary -Xlog:disable Looks good to me. src/hotspot/share/runtime/vframeArray.cpp line 460: > 458: vframe* f = vframe::new_vframe(iframe(), &map, thread); > 459: f->print_on(&ls); > 460: if (WizardMode && Verbose) method()->print_codes_on(&ls); I guess David already commented on this but we're trying to remove WizardMode and Verbose. Can this be: if (lm.is_trace()) method->print_codes_on(lm); ? Also methodHandle doesn't require () because -> turns it into Method*. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10645 From coleenp at openjdk.org Wed Oct 19 18:05:24 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 18:05:24 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v3] In-Reply-To: <9hxX0Gr6WNWv1iDBmEEAbyW0_rm40k5j3wFWehRuD1s=.66c7a911-176d-47ab-84e4-1d013a521503@github.com> References: <_mlD5HyVj54tSXfOJY7bQi7NHxiwk2hhRpCkja2xHfg=.890bcecc-64dc-4d41-a656-f27aef4cea29@github.com> <9hxX0Gr6WNWv1iDBmEEAbyW0_rm40k5j3wFWehRuD1s=.66c7a911-176d-47ab-84e4-1d013a521503@github.com> Message-ID: On Thu, 13 Oct 2022 02:32:38 GMT, David Holmes wrote: >> This is virtual, taken from `ResourceObj`. The `override` indicates this, I'm basing this on: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rh-override > > Thanks - missed that. I don't know if we've added 'override' to our source base yet. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From iklam at openjdk.org Wed Oct 19 18:39:25 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 19 Oct 2022 18:39:25 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v4] In-Reply-To: References: Message-ID: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> > Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: > > - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. > - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. > > By doing the resolution at dump time, we can speed up run time start-up by a little bit. > > The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - @coleenp comments: changed to AllStatic - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime - fixed product build - @coleenp comments - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10330/files - new: https://git.openjdk.org/jdk/pull/10330/files/980802d6..df53a7b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10330&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10330&range=02-03 Stats: 3391 lines in 582 files changed: 1749 ins; 122 del; 1520 mod Patch: https://git.openjdk.org/jdk/pull/10330.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10330/head:pull/10330 PR: https://git.openjdk.org/jdk/pull/10330 From iklam at openjdk.org Wed Oct 19 18:39:25 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 19 Oct 2022 18:39:25 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v4] In-Reply-To: References: <8iNwjxGcRg1uK4X6s7Bk0Sv4qWbYiE_2-Avo3X65xc4=.5583d1df-b49a-419a-aee6-f3f6fffb5b63@github.com> Message-ID: On Tue, 18 Oct 2022 20:38:05 GMT, Coleen Phillimore wrote: >> This table is used to check if we have already worked on the class: >> >> >> bool first_time; >> _processed_classes.put_if_absent(ik, &first_time); >> if (!first_time) { >> return; >> } > > Oh, ok I missed that. Can you add a comment there about what you're doing and why? Done ------------- PR: https://git.openjdk.org/jdk/pull/10330 From iklam at openjdk.org Wed Oct 19 18:39:26 2022 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 19 Oct 2022 18:39:26 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v3] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 20:43:06 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed product build > > src/hotspot/share/cds/classPrelinker.hpp line 51: > >> 49: // if all of its supertypes are loaded from the CDS archive. >> 50: class ClassPrelinker : public StackObj { >> 51: typedef ResourceHashtable ClassesTable; > > You can use the new shiny using ClassesTable = ResourceHashtable instead. Done. ------------- PR: https://git.openjdk.org/jdk/pull/10330 From shade at openjdk.org Wed Oct 19 18:51:03 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Oct 2022 18:51:03 GMT Subject: RFR: 8294468: Fix char-subscripts warnings in Hotspot [v2] In-Reply-To: References: Message-ID: On Mon, 17 Oct 2022 18:08:21 GMT, Aleksey Shipilev wrote: >> There seem to be the only place in Hotspot where this warning fires, yet the warning is disabled wholesale for Hotspot. This is not good. >> >> I can trace the addition of char-subscripts exclusion to [JDK-8211029](https://bugs.openjdk.org/browse/JDK-8211029) (Sep 2018). The only place in Hotspot where in fires is present from the initial load (2007). >> >> The underlying problem that this warning tells us about is that `char` might be signed on some platforms, so we can potentially access the negative index. It is not a bug in our current code, that bounds the value of `k` under `MAXID-1`, which is `19`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` >> - [x] The build matrix of: >> - GCC 10 >> - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} >> - {server} >> - {release, fastdebug} > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into JDK-8294468-warning-char-subscripts > - Fix The GHA failure is unrelated. ------------- PR: https://git.openjdk.org/jdk/pull/10455 From kvn at openjdk.org Wed Oct 19 18:51:07 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Oct 2022 18:51:07 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 10:16:41 GMT, Johan Sj?len wrote: > Old `-XX:+WizardMode -XX:+Verbose -XX:+PrintDeoptimizationDetails`: You only need `-XX:+PrintDeoptimizationDetails` to see old output. `-XX:+WizardMode -XX:+Verbose` will mess it up. I don't see next output in your example after `- method ` last line: {method} {0x0000000800009b10} 'equals' '(Ljava/lang/Object;)Z' in 'java/lang/String' bci: 8 locals: 0 "true"{0x00000007ff84c3a0} <0x00000007ff84c3a0> 1 NULL <0x0000000000000000> 2 0 (int) 0.000000 (float) 0 (hex) expressions: 0 NULL <0x0000000000000000> ------------- PR: https://git.openjdk.org/jdk/pull/10645 From shade at openjdk.org Wed Oct 19 18:54:21 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Oct 2022 18:54:21 GMT Subject: Integrated: 8294468: Fix char-subscripts warnings in Hotspot In-Reply-To: References: Message-ID: On Tue, 27 Sep 2022 17:27:03 GMT, Aleksey Shipilev wrote: > There seem to be the only place in Hotspot where this warning fires, yet the warning is disabled wholesale for Hotspot. This is not good. > > I can trace the addition of char-subscripts exclusion to [JDK-8211029](https://bugs.openjdk.org/browse/JDK-8211029) (Sep 2018). The only place in Hotspot where in fires is present from the initial load (2007). > > The underlying problem that this warning tells us about is that `char` might be signed on some platforms, so we can potentially access the negative index. It is not a bug in our current code, that bounds the value of `k` under `MAXID-1`, which is `19`. > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] The build matrix of: > - GCC 10 > - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} > - {server} > - {release, fastdebug} This pull request has now been integrated. Changeset: ceb5b089 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ceb5b08964e34dfae3819257e5df460f24f92a78 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod 8294468: Fix char-subscripts warnings in Hotspot Reviewed-by: dholmes, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/10455 From shade at openjdk.org Wed Oct 19 19:11:42 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Oct 2022 19:11:42 GMT Subject: RFR: 8294438: Fix misleading-indentation warnings in hotspot [v3] In-Reply-To: References: Message-ID: > There are number of places where misleading-indentation is reported by GCC. Currently, the warning is disabled for the entirety of Hotspot, which is not good. > > C1 does an unusual style here. Changing it globally would touch a lot of lines. Instead of doing that, I fit the existing style while also resolving the warnings. Note this actually solves a bug in `lir_alloc_array`, where `do_temp` are called without a check. > > Build-tested this with product of: > - GCC 10 > - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} > - {server, zero} > - {release, fastdebug} > > Linux x86_64 fastdebug `tier1` is fine. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into JDK-8294438-misleading-indentation - Merge branch 'master' into JDK-8294438-misleading-indentation - Merge branch 'master' into JDK-8294438-misleading-indentation - Also javaClasses.cpp - Fix ------------- Changes: https://git.openjdk.org/jdk/pull/10444/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10444&range=02 Stats: 56 lines in 5 files changed: 7 ins; 20 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/10444.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10444/head:pull/10444 PR: https://git.openjdk.org/jdk/pull/10444 From kvn at openjdk.org Wed Oct 19 19:18:03 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 19 Oct 2022 19:18:03 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v3] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 09:00:14 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary -Xlog:disable I see in new output next lines repeated instead of missing lines from old output: [0.703s][debug][deoptimization] Interpreted frame (sp=0x00007fb5bbf165b8 unextended sp=0x00007fb5bbf165b8, fp=0x00007fb5bbf16608, real_fp=0x00007fb5bbf16608, pc=0x00007fb5a43c26a0) [0.703s][debug][deoptimization] ~deoptimization entry points [0x00007fb5a43c26a0, 0x00007fb5a43c5b38] 13464 bytes [0.703s][debug][deoptimization] BufferBlob (0x00007fb5a43a3510) used for Interpreter ------------- PR: https://git.openjdk.org/jdk/pull/10645 From dlong at openjdk.org Wed Oct 19 19:31:00 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 19 Oct 2022 19:31:00 GMT Subject: RFR: 8293939: Move continuation_enter_setup and friends In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 16:04:04 GMT, Coleen Phillimore wrote: > Please review this trivial change. I moved these functions to sharedRuntime_.cpp just before they are used. > Tested with tier1 on x86 and aarch64, and locally test/hotspot/jtreg:hotspot_loom. Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/10770 From coleenp at openjdk.org Wed Oct 19 19:53:53 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 19:53:53 GMT Subject: RFR: 8293939: Move continuation_enter_setup and friends In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 16:04:04 GMT, Coleen Phillimore wrote: > Please review this trivial change. I moved these functions to sharedRuntime_.cpp just before they are used. > Tested with tier1 on x86 and aarch64, and locally test/hotspot/jtreg:hotspot_loom. Thanks Dean. ------------- PR: https://git.openjdk.org/jdk/pull/10770 From pchilanomate at openjdk.org Wed Oct 19 20:11:53 2022 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 19 Oct 2022 20:11:53 GMT Subject: RFR: 8293939: Move continuation_enter_setup and friends In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 16:04:04 GMT, Coleen Phillimore wrote: > Please review this trivial change. I moved these functions to sharedRuntime_.cpp just before they are used. > Tested with tier1 on x86 and aarch64, and locally test/hotspot/jtreg:hotspot_loom. Looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.org/jdk/pull/10770 From coleenp at openjdk.org Wed Oct 19 20:11:53 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 20:11:53 GMT Subject: RFR: 8293939: Move continuation_enter_setup and friends In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 16:04:04 GMT, Coleen Phillimore wrote: > Please review this trivial change. I moved these functions to sharedRuntime_.cpp just before they are used. > Tested with tier1 on x86 and aarch64, and locally test/hotspot/jtreg:hotspot_loom. Thanks Patricio. ------------- PR: https://git.openjdk.org/jdk/pull/10770 From coleenp at openjdk.org Wed Oct 19 20:16:43 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 19 Oct 2022 20:16:43 GMT Subject: Integrated: 8293939: Move continuation_enter_setup and friends In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 16:04:04 GMT, Coleen Phillimore wrote: > Please review this trivial change. I moved these functions to sharedRuntime_.cpp just before they are used. > Tested with tier1 on x86 and aarch64, and locally test/hotspot/jtreg:hotspot_loom. This pull request has now been integrated. Changeset: 017e7988 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/017e7988b197427f6b464303788a418a1d892ab9 Stats: 368 lines in 5 files changed: 155 ins; 202 del; 11 mod 8293939: Move continuation_enter_setup and friends Reviewed-by: dlong, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/10770 From coleenp at openjdk.org Thu Oct 20 00:18:47 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Oct 2022 00:18:47 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v4] In-Reply-To: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> References: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> Message-ID: On Wed, 19 Oct 2022 18:39:25 GMT, Ioi Lam wrote: >> Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: >> >> - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. >> - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. >> >> By doing the resolution at dump time, we can speed up run time start-up by a little bit. >> >> The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - @coleenp comments: changed to AllStatic > - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime > - fixed product build > - @coleenp comments > - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime > - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time Maybe the third read is the charm but this makes sense to me, and looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10330 From amenkov at openjdk.org Thu Oct 20 01:06:57 2022 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 20 Oct 2022 01:06:57 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. Marked as reviewed by amenkov (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10736 From sspitsyn at openjdk.org Thu Oct 20 01:19:49 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 20 Oct 2022 01:19:49 GMT Subject: RFR: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. Chris and Alex, thank you for review! ------------- PR: https://git.openjdk.org/jdk/pull/10736 From sspitsyn at openjdk.org Thu Oct 20 01:21:13 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 20 Oct 2022 01:21:13 GMT Subject: Integrated: 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 01:20:15 GMT, Serguei Spitsyn wrote: > The JDI ClassUnloadEvent events are synthesized by the JDWP agent from the JVM TI ObjectFree events. > The JVM TI ObjectFree events are flushed when the JVM TI SetEvenNotificationMode is used to disable the ObjectFree events. It is not very helpful for JDWP agent as the ObjectFree events are always enabled. > The fix is to flush all pending ObjectFree events at the VM shutdown. > > Testing: > > All mach5 jobs with JVMTI/JDI tests and tiers 1-6 were successfully passed on 3 debug platforms. This pull request has now been integrated. Changeset: c5e04640 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/c5e0464098f8f7cd9c568c7b1c3a06139453eaab Stats: 9 lines in 2 files changed: 2 ins; 5 del; 2 mod 8291456: com/sun/jdi/ClassUnloadEventTest.java failed with: Wrong number of class unload events: expected 10 got 4 Reviewed-by: cjplummer, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/10736 From dholmes at openjdk.org Thu Oct 20 02:18:18 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Oct 2022 02:18:18 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 13:21:28 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix review comments src/hotspot/share/runtime/vframe.cpp line 684: > 682: } > 683: void vframe::print_on(outputStream* st) const { > 684: _fr.print_value_on(st,NULL); I think this was changed in response to Coleens' comment. While I agree we are getting rid of `WizardMode` this PR is about `PrintDeoptimizationDetails` and this change seems unrelated to that (unless nothing else calls `vframe::print`?). ------------- PR: https://git.openjdk.org/jdk/pull/10645 From fyang at openjdk.org Thu Oct 20 03:19:53 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Oct 2022 03:19:53 GMT Subject: RFR: 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler Message-ID: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> This is similar to: https://bugs.openjdk.org/browse/JDK-8295257 Remove implicit `= noreg` temporary register arguments for the three methods that still have them. * `load_heap_oop` * `store_heap_oop` * `load_heap_oop_not_null` Only `load_heap_oop` is used with the implicit `= noreg` arguments. After [JDK-8293769](https://bugs.openjdk.org/browse/JDK-8293769), the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. Testing: Tier1 hotspot with fastdebug build on HiFive Unmatched board. ------------- Commit messages: - 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler Changes: https://git.openjdk.org/jdk/pull/10778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295703 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/10778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10778/head:pull/10778 PR: https://git.openjdk.org/jdk/pull/10778 From thartmann at openjdk.org Thu Oct 20 05:39:04 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Oct 2022 05:39:04 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10747 From dlong at openjdk.org Thu Oct 20 06:01:38 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 20 Oct 2022 06:01:38 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Thanks Tobias. @fisk, please take a look when you get the chance. ------------- PR: https://git.openjdk.org/jdk/pull/10747 From yadongwang at openjdk.org Thu Oct 20 06:16:07 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Thu, 20 Oct 2022 06:16:07 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <71FJ9u1Q-7crwjH4Pxd1ISRWRWXViRLQFSC93OtjsN4=.b0f9d48d-093d-4580-b87d-90b79109f80d@github.com> On Wed, 19 Oct 2022 10:28:20 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - Explicit use of temp registers > - fixup! Add -XX:CacheLineSize= to set cache line size lgtm ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10718 From shade at openjdk.org Thu Oct 20 07:16:55 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Oct 2022 07:16:55 GMT Subject: RFR: 8294438: Fix misleading-indentation warnings in hotspot [v3] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 19:11:42 GMT, Aleksey Shipilev wrote: >> There are number of places where misleading-indentation is reported by GCC. Currently, the warning is disabled for the entirety of Hotspot, which is not good. >> >> C1 does an unusual style here. Changing it globally would touch a lot of lines. Instead of doing that, I fit the existing style while also resolving the warnings. Note this actually solves a bug in `lir_alloc_array`, where `do_temp` are called without a check. >> >> Build-tested this with product of: >> - GCC 10 >> - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} >> - {server, zero} >> - {release, fastdebug} >> >> Linux x86_64 fastdebug `tier1` is fine. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into JDK-8294438-misleading-indentation > - Merge branch 'master' into JDK-8294438-misleading-indentation > - Merge branch 'master' into JDK-8294438-misleading-indentation > - Also javaClasses.cpp > - Fix Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10444 From shade at openjdk.org Thu Oct 20 07:19:03 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Oct 2022 07:19:03 GMT Subject: RFR: 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler In-Reply-To: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> References: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> Message-ID: On Thu, 20 Oct 2022 03:12:31 GMT, Fei Yang wrote: > This is similar to: https://bugs.openjdk.org/browse/JDK-8295257 > > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293769](https://bugs.openjdk.org/browse/JDK-8293769), the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: Tier1 hotspot with fastdebug build on HiFive Unmatched board. Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10778 From shade at openjdk.org Thu Oct 20 07:21:03 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Oct 2022 07:21:03 GMT Subject: Integrated: 8294438: Fix misleading-indentation warnings in hotspot In-Reply-To: References: Message-ID: On Tue, 27 Sep 2022 10:28:54 GMT, Aleksey Shipilev wrote: > There are number of places where misleading-indentation is reported by GCC. Currently, the warning is disabled for the entirety of Hotspot, which is not good. > > C1 does an unusual style here. Changing it globally would touch a lot of lines. Instead of doing that, I fit the existing style while also resolving the warnings. Note this actually solves a bug in `lir_alloc_array`, where `do_temp` are called without a check. > > Build-tested this with product of: > - GCC 10 > - {i686, x86_64, aarch64, powerpc64le, s390x, armhf, riscv64} > - {server, zero} > - {release, fastdebug} > > Linux x86_64 fastdebug `tier1` is fine. This pull request has now been integrated. Changeset: 545021b1 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/545021b18d6f82ac8013009939ef4e05b8ebf7ce Stats: 56 lines in 5 files changed: 7 ins; 20 del; 29 mod 8294438: Fix misleading-indentation warnings in hotspot Reviewed-by: ihse, dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/10444 From fjiang at openjdk.org Thu Oct 20 07:26:47 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 20 Oct 2022 07:26:47 GMT Subject: RFR: 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler In-Reply-To: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> References: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> Message-ID: On Thu, 20 Oct 2022 03:12:31 GMT, Fei Yang wrote: > This is similar to: https://bugs.openjdk.org/browse/JDK-8295257 > > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293769](https://bugs.openjdk.org/browse/JDK-8293769), the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: Tier1 hotspot with fastdebug build on HiFive Unmatched board. lgtm ------------- Marked as reviewed by fjiang (Author). PR: https://git.openjdk.org/jdk/pull/10778 From dholmes at openjdk.org Thu Oct 20 07:34:00 2022 From: dholmes at openjdk.org (David Holmes) Date: Thu, 20 Oct 2022 07:34:00 GMT Subject: RFR: 8294438: Fix misleading-indentation warnings in hotspot [v3] In-Reply-To: References: Message-ID: <_mUxIObAiNsVdxaC637-8aaIwXyHFc5xaBGu_phRn_0=.b5cf975b-3649-4ad2-82d6-8de11ef09bf8@github.com> On Thu, 20 Oct 2022 07:14:27 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into JDK-8294438-misleading-indentation >> - Merge branch 'master' into JDK-8294438-misleading-indentation >> - Merge branch 'master' into JDK-8294438-misleading-indentation >> - Also javaClasses.cpp >> - Fix > > Thanks! @shipilev this has broken our linux aarch64 builds! [2022-10-20T07:26:59,542Z] workspace/open/src/hotspot/cpu/aarch64/assembler_aarch64.cpp: In member function 'void Address::lea(MacroAssembler*, Register) const': [2022-10-20T07:26:59,542Z] workspace/open/src/hotspot/cpu/aarch64/assembler_aarch64.cpp:138:5: error: this 'else' clause does not guard... [-Werror=misleading-indentation] [2022-10-20T07:26:59,542Z] 138 | else [2022-10-20T07:26:59,542Z] | ^~~~ [2022-10-20T07:26:59,542Z] workspace/open/src/hotspot/cpu/aarch64/assembler_aarch64.cpp:140:7: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'else' [2022-10-20T07:26:59,542Z] 140 | break; [2022-10-20T07:26:59,542Z] | ^~~~~ ------------- PR: https://git.openjdk.org/jdk/pull/10444 From rehn at openjdk.org Thu Oct 20 07:34:51 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 20 Oct 2022 07:34:51 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix Shenandoah Thank you! Ship it! ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10745 From vkempik at openjdk.org Thu Oct 20 07:37:07 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 20 Oct 2022 07:37:07 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Wed, 19 Oct 2022 10:28:20 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - Explicit use of temp registers > - fixup! Add -XX:CacheLineSize= to set cache line size Marked as reviewed by vkempik (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10718 From shade at openjdk.org Thu Oct 20 07:37:13 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Oct 2022 07:37:13 GMT Subject: RFR: 8294438: Fix misleading-indentation warnings in hotspot [v3] In-Reply-To: References: Message-ID: On Thu, 20 Oct 2022 07:14:27 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into JDK-8294438-misleading-indentation >> - Merge branch 'master' into JDK-8294438-misleading-indentation >> - Merge branch 'master' into JDK-8294438-misleading-indentation >> - Also javaClasses.cpp >> - Fix > > Thanks! > @shipilev this has broken our linux aarch64 builds! Whoa. Looking. ------------- PR: https://git.openjdk.org/jdk/pull/10444 From eosterlund at openjdk.org Thu Oct 20 07:41:32 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 20 Oct 2022 07:41:32 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: <9PStlNrNfDolJIAQrhWBLXjFGTsRhY0TVZB4YWHwQeI=.56aedabb-f9cc-4362-955b-2aa1367bd026@github.com> On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Looks good in general, but I have two questions. src/hotspot/share/runtime/sharedRuntime.cpp line 2113: > 2111: NoSafepointVerifier nsv; > 2112: > 2113: CompiledMethod* callee = moop->code(); There is a moop->code() null check just a few lines below, so now it looks like we are reading the code pointer twice checking if it is null. Is ot enough to do that one time? src/hotspot/share/runtime/sharedRuntime.cpp line 2119: > 2117: > 2118: CodeBlob* cb = CodeCache::find_blob(caller_pc); > 2119: if (cb == NULL || !cb->is_compiled() || callee->is_unloading()) { Why not move the is_unloading check on callee to the if statement just above that checks the callee (as opposed to the callsite)? ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10747 From shade at openjdk.org Thu Oct 20 07:44:03 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 20 Oct 2022 07:44:03 GMT Subject: RFR: 8294438: Fix misleading-indentation warnings in hotspot [v3] In-Reply-To: References: Message-ID: On Thu, 20 Oct 2022 07:34:29 GMT, Aleksey Shipilev wrote: > > @shipilev this has broken our linux aarch64 builds! > > Whoa. Looking. That would be: #10781 ------------- PR: https://git.openjdk.org/jdk/pull/10444 From tschatzl at openjdk.org Thu Oct 20 08:23:52 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 20 Oct 2022 08:23:52 GMT Subject: RFR: 8233697: CHT: Iteration parallelization In-Reply-To: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Wed, 19 Oct 2022 10:15:46 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. > > Usecase is in parallelizing the merging of large remsets for G1. > > Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). > > Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. > This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). > > This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. > > Testing: tier 1-3 Changes requested by tschatzl (Reviewer). src/hotspot/share/utilities/concurrentHashTable.hpp line 499: > 497: template > 498: void do_safepoint_scan(SCAN_FUNC& scan_f, BucketsClaimer* bucket_claimer); > 499: Suggestion: // Visit all items with SCAN_FUNC without any protection. // Thread-safe, but must be called at safepoint. class BucketsClaimer; template void do_safepoint_scan(SCAN_FUNC& scan_f, BucketsClaimer* bucket_claimer); This is just a suggestion: maybe put the declaration of `BucketsClaimer` close to the only use. src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 981: > 979: } > 980: > 981: Superfluous whitespace. src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1350: > 1348: // The table is split into ranges, every increment is one range. > 1349: volatile size_t _next_to_claim; > 1350: size_t _claim_size_log2; // Log number of buckets in claimed range. For naming the constants containing log values (if kept), I would prefer if the existing style to put the `log2` in the front were kept. src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1359: > 1357: > 1358: public: > 1359: BucketsClaimer(ConcurrentHashTable* cht) : Would it be possible to add a claim size parameter (with default value `DEFAULT_CLAIM_SIZE_LOG2`)? Particularly I'm not convinced that the default value is good :) src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1370: > 1368: size_t size_log2 = _cht->_table->_log2_size; > 1369: _claim_size_log2 = MIN2(_claim_size_log2, size_log2); > 1370: _limit = (size_t)1 << (size_log2 - _claim_size_log2); Is it really advantageous to keep these values as log2? It seems to me that just storing the actual non-log values would actually require less work everywhere. I am fine with restricting claim increments to powers of two (but even that seems artificial), there only seems to be no advantage to use logs anywhere (only that every use needs to do the shift to get the actual value). src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1393: > 1391: return true; > 1392: } > 1393: } Would it be useful/feasible to put this block of code into a helper method? Something like `claim_from_table(&_next_to_claim, _limit, _claim_size_log2, _cht->get_table(), start, stop, table);`, potentially wrapping the first few parameters into a struct? Also because the second version seems to be missing the early-out, i.e. some copy&paste oversight. ------------- PR: https://git.openjdk.org/jdk/pull/10759 From fyang at openjdk.org Thu Oct 20 08:36:38 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Oct 2022 08:36:38 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" Message-ID: On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at is named "tmp_thread". But the callers will pass an ordinary temporary register for this parameter which has no relation with the thread register. We should rename this formal parameter from "tmp_thread" to "tmp2". Testing: Fastdebug builds on linux-aarch64 & linux-riscv64. ------------- Commit messages: - 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" Changes: https://git.openjdk.org/jdk/pull/10783/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10783&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295711 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10783.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10783/head:pull/10783 PR: https://git.openjdk.org/jdk/pull/10783 From fjiang at openjdk.org Thu Oct 20 08:47:55 2022 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 20 Oct 2022 08:47:55 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: References: Message-ID: On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. Marked as reviewed by fjiang (Author). ------------- PR: https://git.openjdk.org/jdk/pull/10783 From ihse at openjdk.org Thu Oct 20 10:36:03 2022 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 20 Oct 2022 10:36:03 GMT Subject: Integrated: 8295470: Update openjdk.java.net => openjdk.org URLs in test code In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 11:55:06 GMT, Magnus Ihse Bursie wrote: > This is a continuation of the effort to update all our URLs to the new top-level domain. > > This patch updates (most) URLs in testing code. There still exists references to openjdk.java.net, but that are not strictly used as normal URLs, which I deemed need special care, so I left them out of this one, which is more of a straight-forward search and replace. > > I have manually verified that the links work (or points to bugs.openjdk.org and looks sane; I did not click on all those). I have replaced `http` with `https`. I have replaced links to specific commits on the mercurial server with links to the corresponding commits in the new git repos. This pull request has now been integrated. Changeset: d5a1521f Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/d5a1521fde3f6ff7e810e8257a4722a09c9ef60b Stats: 138 lines in 45 files changed: 46 ins; 0 del; 92 mod 8295470: Update openjdk.java.net => openjdk.org URLs in test code Reviewed-by: michaelm, prr, darcy ------------- PR: https://git.openjdk.org/jdk/pull/10744 From adinn at redhat.com Thu Oct 20 11:20:33 2022 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Oct 2022 12:20:33 +0100 Subject: Biased locking Obsoletion In-Reply-To: <11e412f4-12fb-2200-61d8-b8acc5f52b1c@oracle.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> <11e412f4-12fb-2200-61d8-b8acc5f52b1c@oracle.com> Message-ID: <13a6581e-24c1-9423-1517-4adfb784eed4@redhat.com> I'm reviving this discussion from an old thread here because of a new Red Hat customer issue that has emerged thanks to removal of biased locking. The customer has some code to compare data in two input streams byte by byte. The benchmark is fairly simple, comparing each file in a suitably long list with itself byte for byte. It includes code like this at its core: boolean contentChanged = false; BufferedInputStream oldContent = ...; BufferedInputStream newContent = ...; try { int newByte = newContent.read(); int oldByte = oldContent.read(); while (newByte != -1 && oldByte != -1 && newByte == oldByte) { newByte = newContent.read(); oldByte = oldContent.read(); } contentChanged = newByte != oldByte; } catch (IOException e) { contentChanged = true; } ... This code slows down considerably when biased locking is not available. Of course, the problem is that the API only provides a synchronized method, BufferedInputStream.read(), for reading single bytes. So, without biased locking (or some improved version of non-biased locking) client code that needs to perform per-byte read+consume steps is going to be harmed. Clearly with this simple example the slowdown can be worked round by doing bulk reads but that is not really the point. The model for client use implemented by the InputStream API suffers from the same issues that mar the OutputStream API. What makes this especially egregious is that this is a *buffered* stream. Buffering is supposed to make piecemeal access more efficient and, no doubt, it does when compared to doing individual reads at the device level. Still, using buffered in the name definitely jars with a client API model that presumes concurrent consumption and punishes byte by byte access. I am not clear why the lock coarsening has not kicked in with this input stream case. That optimization was found not to be effective in the case of the output stream code but the problem was easily fixed (see https://bugs.openjdk.org/browse/JDK-8254078). regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From Alan.Bateman at oracle.com Thu Oct 20 12:01:09 2022 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 20 Oct 2022 13:01:09 +0100 Subject: Biased locking Obsoletion In-Reply-To: <13a6581e-24c1-9423-1517-4adfb784eed4@redhat.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> <11e412f4-12fb-2200-61d8-b8acc5f52b1c@oracle.com> <13a6581e-24c1-9423-1517-4adfb784eed4@redhat.com> Message-ID: On 20/10/2022 13:20, Andrew Dinn wrote: > : > > This code slows down considerably when biased locking is not > available. Of course, the problem is that the API only provides a > synchronized method, BufferedInputStream.read(), for reading single > bytes. So, without biased locking (or some improved version of > non-biased locking) client code that needs to perform per-byte > read+consume steps is going to be harmed. A general point is that you are more likely to see this with the java.io APIs because they date from early JDK releases where they was a tenancy to synchronize everything. The synchronization is not specified and for most of these classes it just doesn't make sense to have several threads reading from a stream. However, there is 25+ years of usage, and there is concurrency if there is async close, so it would not be easy to just remove it. The DataOutputStream change that you linked to was okay because the class wasn't thread safe already and part of the change was to add a disclaimer on thread safety to the javadoc. I don't know know which JDK release was used for the test but BIS changed in JDK 19 to use j.u.concurrent locks when a BIS is constructed directly. It may be that it could be changed further to use a stamped lock but it would it would requires exclusive access. Just mentioning this because I think lock coarsening is monitors only (BIS does use monitors when subclassing as existing code may assume synchronization in the super class). -Alan From adinn at redhat.com Thu Oct 20 13:00:33 2022 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Oct 2022 14:00:33 +0100 Subject: Biased locking Obsoletion In-Reply-To: References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> <11e412f4-12fb-2200-61d8-b8acc5f52b1c@oracle.com> <13a6581e-24c1-9423-1517-4adfb784eed4@redhat.com> Message-ID: <9bb7eb24-48a7-f60c-0302-b53bd60764db@redhat.com> On 20/10/2022 13:01, Alan Bateman wrote: > A general point is that you are more likely to see this with the java.io > APIs because they date from early JDK releases where they was a tenancy > to synchronize everything. The synchronization is not specified and for > most of these classes it just doesn't make sense to have several threads > reading from a stream. However, there is 25+ years of usage, and there > is concurrency if there is async close, so it would not be easy to just > remove it. The DataOutputStream change that you linked to was okay > because the class wasn't thread safe already and part of the change was > to add a disclaimer on thread safety to the javadoc. Yes, I realise this is a very unfortunate legacy issue and advised our customer of that. > I don't know know which JDK release was used for the test but BIS > changed in JDK 19 to use j.u.concurrent locks when a BIS is constructed > directly. It may be that it could be changed further to use a stamped > lock but it would it would requires exclusive access. Just mentioning > this because I think lock coarsening is monitors only (BIS does use > monitors when subclassing as existing code may assume synchronization in > the super class). Yes, understood -- indeed, Aleksey Shipilev just explained that to me in a separate, off-list email exchange. I also just noticed that the cited example is tricky to handle even on a JVM that precedes the switch to j.u.c locks because execution involves repeated, alternate synchronization on two different BIS instances. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From egahlin at openjdk.org Thu Oct 20 13:16:28 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 20 Oct 2022 13:16:28 GMT Subject: RFR: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing [v2] In-Reply-To: References: Message-ID: > Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. > > The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. > > TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. > > Testing: tier1-3 + test/jdk/jdk/jfr > > Thanks > Erik Erik Gahlin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into moddda - Fix disabled - Fix pointer format - Initial ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10723/files - new: https://git.openjdk.org/jdk/pull/10723/files/e5035b14..7b41be0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10723&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10723&range=00-01 Stats: 18465 lines in 1007 files changed: 8888 ins; 6070 del; 3507 mod Patch: https://git.openjdk.org/jdk/pull/10723.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10723/head:pull/10723 PR: https://git.openjdk.org/jdk/pull/10723 From coleenp at openjdk.org Thu Oct 20 14:14:53 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 20 Oct 2022 14:14:53 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: On Thu, 20 Oct 2022 01:51:36 GMT, David Holmes wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix review comments > > src/hotspot/share/runtime/vframe.cpp line 684: > >> 682: } >> 683: void vframe::print_on(outputStream* st) const { >> 684: _fr.print_value_on(st,NULL); > > I think this was changed in response to Coleens' comment. While I agree we are getting rid of `WizardMode` this PR is about `PrintDeoptimizationDetails` and this change seems unrelated to that (unless nothing else calls `vframe::print`?). There's a couple other printing function that call vframe::print() so I think that should keep WizardMode on for now until the callers become UL converted. I don't think the vframe::print_on() version should keep WizardMode since logging calls it with the verbose logging level (guessing this is true, please check!) ------------- PR: https://git.openjdk.org/jdk/pull/10645 From fyang at openjdk.org Thu Oct 20 14:23:53 2022 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Oct 2022 14:23:53 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Wed, 19 Oct 2022 10:28:20 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - Explicit use of temp registers > - fixup! Add -XX:CacheLineSize= to set cache line size Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4127: > 4125: sub(cnt, cnt, tmp2); > 4126: add(tmp3, zr, zr); > 4127: movptr(tmp3, initial_table_end); I think it will be more efficient if we make use of 'auipc' instruction here for this purpose. The current version which makes use of 'movptr' will emits 6 instructions. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4136: > 4134: bind(initial_table_end); > 4135: > 4136: li(tmp1, CacheLineSize >> 3); It will be more consistent to use 'mv' here instead of 'li'. I am considering making 'li' a private method. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 685: > 683: Label small; > 684: int low_limit = MAX2(CacheLineSize, BlockZeroingLowLimit); > 685: __ li(t0, low_limit); Save as above. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 48: > 46: > 47: if (!FLAG_IS_DEFAULT(CacheLineSize) && !is_power_of_2(CacheLineSize)) { > 48: warning("CacheLineSize must be a power of 2"); TBH, I am worried about the case when user specified some inaccurate cache-line size here, especially when the specified value is bigger than the actual cache-line size. The currently implementation won't work in that case. We really need some way to determine the cache-line size at runtime to be safe. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From coleen.phillimore at oracle.com Thu Oct 20 14:26:12 2022 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 20 Oct 2022 10:26:12 -0400 Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On 10/19/22 2:41 AM, Thomas Stuefe wrote: > On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson wrote: > >>> Background to this patch: >>> >>> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >>> >>> PR RFC: >>> >>> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >>> >>> MetaspaceObj - allocates in the Metaspace >>> CHeap - uses malloc >>> ResourceObj - ... >>> >>> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >>> >>> This is IMHO misleading, and often leads to confusion among HotSpot developers. >>> >>> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >>> >>> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >>> >>> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >>> >>> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Shenandoah > So, `AnyObj` is the holdover from the old ResourceObj? Is the intent to move most objects to a clear allocation type in followup RFEs? Yes. > > I'm not sure that we need `AnyObj` at all. Or, we could get rid of it in the future. The purpose of `AnyObj` is to place an object into a different area than its class designer intended. That can be done simply via holder objects, see my earlier comment. It would remove the runtime cost of tracking the allocation type per allocation. It would be a bit less convenient since you have to go thru the holder when accessing the object. But we don't want to have many classes like these anyway. Yes, we're trying to find a nice way to remove AnyObj in favor of a factory or holder object for these classes.? There are fewer than we thought.? This change is nice because we can easily find them by looking for AnyObj, rather than the ResourceObjs. > > I think that many cases where we today have a ResourceObj allocated in C-Heap can actually be made CHeapObj. I'm not sure how many real cases there are where we mix C-heap and RA for the same class. The not well named anymore class ResourceHashtable is one such class that may be better always CHeap obj allocated, but we have to see. There are a couple of AnyObj classes that I don't know anything about so they can be looked at also. > > ---- > > If we go with `AnyObj`, I have a smaller concern: > > `AnyObj` sounds like the typical absolute base class many frameworks have. But here, it has a specific role, and we probably want to discourage its use for new classes. It should only be used where its really needed. It does not have to be super convenient, and maybe should be renamed to something like `MultipleAllocationObj` or similar. > > Then, allocation area for `AnyObj` is determined by the overloaded new I use: > > - `new X;` // RA > - `new (mtTest) X;` // C-Heap > - `new (arena) X;` // lives in Arena > > but this can be confusing, especially for newcomers. The default of RA is surprising, in `ResourceObj` it was right in the name. Since RA allocation is a bit dangerous, I think RA as default can trip over people. I think this RA default allocation is something we want to fix further by making AnyObj not be an allocation class but these objects use a factory.? For this change, we're just separating AnyObj out from true ResourceObjs.? We have further work that we will do. > > I also dislike the MEMFLAGS==CHeap association. MEMFLAGS is semantically different from where the object lives and this may bite us later. > > For example, today we don't require MEMFLAG for RA allocation because the arena is tagged with a single flag. But we could easily switch to per-allocation tracking instead by allowing an arena to carry multiple MEMFLAGS. It would have some advantages. Having arena and ResourceArea tracking have multiple MEMFLAGS would increase space and overhead though.? ResourceObjs/ArenaObjs are supposed to be minimally invasive.? We'd have to talk about a change like that. > > Therefore I would require the user to hand in the allocation type when calling new AnyObj. It would be clearer, and since we don't want to have many of these classes anyway, I think that would be okay. Agree that we don't want to have many of these classes anyway. This AnyObj patch cuts down the number of classes we're looking at, which is why I like it so much.? We're not done though, and your ideas about its replacement are welcome. Thanks, Coleen > > src/hotspot/share/asm/codeBuffer.hpp line 386: > >> 384: // CodeBuffers must be allocated on the stack except for a single >> 385: // special case during expansion which is handled internally. This >> 386: // is done to guarantee proper cleanup of resources. > Not your patch, but comment is misleading. Probably should say "on the ResourceArea". > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10745 From kbarrett at openjdk.org Thu Oct 20 15:01:11 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Oct 2022 15:01:11 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v8] In-Reply-To: References: Message-ID: > 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal > 8155996: Improve concurrent refinement green zone control > 8134303: Introduce -XX:-G1UseConcRefinement > > Please review this change to the control of concurrent refinement. > > This new controller takes a different approach to the problem, addressing a > number of issues. > > The old controller used a multiple of the target number of cards to determine > the range over which increasing numbers of refinement threads should be > activated, and finally activating mutator refinement. This has a variety of > problems. It doesn't account for the processing rate, the rate of new dirty > cards, or the time available to perform the processing. This often leads to > unnecessary spikes in the number of running refinement threads. It also tends > to drive the pending number to the target quickly and keep it there, removing > the benefit from having pending dirty cards filter out new cards for nearby > writes. It can't delay and leave excess cards in the queue because it could > be a long time before another buffer is enqueued. > > The old controller was triggered by mutator threads enqueing card buffers, > when the number of cards in the queue exceeded a threshold near the target. > This required a complex activation protocol between the mutators and the > refinement threads. > > With the new controller there is a primary refinement thread that periodically > estimates how many refinement threads need to be running to reach the target > in time for the next GC, along with whether to also activate mutator > refinement. If the primary thread stops running because it isn't currently > needed, it sleeps for a period and reevaluates on wakeup. This eliminates any > involvement in the activation of refinement threads by mutator threads. > > The estimate of how many refinement threads are needed uses a prediction of > time until the next GC, the number of buffered cards, the predicted rate of > new dirty cards, and the predicted refinement rate. The number of running > threads is adjusted based on these periodically performed estimates. > > This new approach allows more dirty cards to be left in the queue until late > in the mutator phase, typically reducing the rate of new dirty cards, which > reduces the amount of concurrent refinement work needed. > > It also smooths out the number of running refinement threads, eliminating the > unnecessarily large spikes that are common with the old method. One benefit > is that the number of refinement threads (lazily) allocated is often much > lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem > described in JDK-8153225.) > > This change also provides a new method for calculating for the number of dirty > cards that should be pending at the start of a GC. While this calculation is > conceptually distinct from the thread control, the two were significanly > intertwined in the old controller. Changing this calculation separately and > first would have changed the behavior of the old controller in ways that might > have introduced regressions. Changing it after the thread control was changed > would have made it more difficult to test and measure the thread control in a > desirable configuration. > > The old calculation had various problems that are described in JDK-8155996. > In particular, it can get more or less stuck at low values, and is slow to > respond to changes. > > The old controller provided a number of product options, none of which were > very useful for real applications, and none of which are very applicable to > the new controller. All of these are being obsoleted. > > -XX:-G1UseAdaptiveConcRefinement > -XX:G1ConcRefinementGreenZone= > -XX:G1ConcRefinementYellowZone= > -XX:G1ConcRefinementRedZone= > -XX:G1ConcRefinementThresholdStep= > > The new controller *could* use G1ConcRefinementGreenZone to provide a fixed > value for the target number of cards, though it is poorly named for that. > > A configuration that was useful for some kinds of debugging and testing was to > disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a > very large value, effectively disabling concurrent refinement. To support > this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic > option has been added (see JDK-8155996). > > The other options are meaningless for the new controller. > > Because of these option changes, a CSR and a release note need to accompany > this change. > > Testing: > mach5 tier1-6 > various performance tests. > local (linux-x64) tier1 with -XX:-G1UseConcRefinement > > Performance testing found no regressions, but also little or no improvement > with default options, which was expected. With default options most of our > performance tests do very little concurrent refinement. And even for those > that do, while the old controller had a number of problems, the impact of > those problems is small and hard to measure for most applications. > > When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare > better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with > MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options > held constant) showed a statistically significant improvement of about 4.5% > for critical-jOPS. Using the changed controller, the difference between this > configuration and the default is fairly small, while the baseline shows > significant degradation with the more restrictive options. > > For all tests and configurations the new controller often creates many fewer > refinement threads. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into crt2 - more copyright updates - tschatzl comments - adjust young target length periodically - use cards in thread-buffers when revising young list target length - remove remset sampler - move remset-driven young-gen resizing - fix type of predict_dirtied_cards_in_threa_buffers - Merge branch 'master' into crt2 - comments around alloc_bytes_rate being zero - ... and 15 more: https://git.openjdk.org/jdk/compare/9b971626...0c2b4c69 ------------- Changes: https://git.openjdk.org/jdk/pull/10256/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10256&range=07 Stats: 1613 lines in 24 files changed: 670 ins; 664 del; 279 mod Patch: https://git.openjdk.org/jdk/pull/10256.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10256/head:pull/10256 PR: https://git.openjdk.org/jdk/pull/10256 From thomas.stuefe at gmail.com Thu Oct 20 15:55:23 2022 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 20 Oct 2022 17:55:23 +0200 Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: Hi Coleen, On Thu, Oct 20, 2022 at 4:26 PM wrote: > > > On 10/19/22 2:41 AM, Thomas Stuefe wrote: > > On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson > wrote: > > > >>> Background to this patch: > >>> > >>> This prototype/patch has been discussed with a few HotSpot devs, and > I've gotten feedback that I should send it out for broader > discussion/review. It could be a first step to make it easier to talk about > our allocation super classes and strategies. This in turn would make it > easier to have further discussions around how to make our allocation > strategies more flexible. E.g. do we really need to tie down utility > classes to a specific allocation strategy? Do we really have to provide > MEMFLAGS as compile time flags? Etc. > >>> > >>> PR RFC: > >>> > >>> HotSpot has a few allocation classes that other classes can inherit > from to get different dynamic-allocation strategies: > >>> > >>> MetaspaceObj - allocates in the Metaspace > >>> CHeap - uses malloc > >>> ResourceObj - ... > >>> > >>> The last class sounds like it provide an allocation strategy to > allocate inside a thread's resource area. This is true, but it also > provides functions to allow the instances to be allocated in Areanas or > even CHeap allocated memory. > >>> > >>> This is IMHO misleading, and often leads to confusion among HotSpot > developers. > >>> > >>> I propose that we simplify ResourceObj to only provide an allocation > strategy for resource allocations, and move the multi-allocation strategy > feature to another class, which isn't named ResourceObj. > >>> > >>> In my proposal and prototype I've used the name AnyObj, as short, > simple name. I'm open to changing the name to something else. > >>> > >>> The patch also adds a new class named ArenaObj, which is for objects > only allocated in provided arenas. > >>> > >>> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP > to `operator new`. If you pass in a MEMFLAGS argument it now means that you > want to allocate on the CHeap. > >> Stefan Karlsson has updated the pull request incrementally with one > additional commit since the last revision: > >> > >> Fix Shenandoah > > So, `AnyObj` is the holdover from the old ResourceObj? Is the intent to > move most objects to a clear allocation type in followup RFEs? > > Yes. > > > > I'm not sure that we need `AnyObj` at all. Or, we could get rid of it in > the future. The purpose of `AnyObj` is to place an object into a different > area than its class designer intended. That can be done simply via holder > objects, see my earlier comment. It would remove the runtime cost of > tracking the allocation type per allocation. It would be a bit less > convenient since you have to go thru the holder when accessing the object. > But we don't want to have many classes like these anyway. > > Yes, we're trying to find a nice way to remove AnyObj in favor of a > factory or holder object for these classes. There are fewer than we > thought. This change is nice because we can easily find them by looking > for AnyObj, rather than the ResourceObjs. > This makes sense. We can do it piece by piece, with individual RFEs. > > > > I think that many cases where we today have a ResourceObj allocated in > C-Heap can actually be made CHeapObj. I'm not sure how many real cases > there are where we mix C-heap and RA for the same class. > > The not well named anymore class ResourceHashtable is one such class > that may be better always CHeap obj allocated, but we have to see. There > are a couple of AnyObj classes that I don't know anything about so they > can be looked at also. > > > > ---- > > > > If we go with `AnyObj`, I have a smaller concern: > > > > `AnyObj` sounds like the typical absolute base class many frameworks > have. But here, it has a specific role, and we probably want to discourage > its use for new classes. It should only be used where its really needed. It > does not have to be super convenient, and maybe should be renamed to > something like `MultipleAllocationObj` or similar. > > > > Then, allocation area for `AnyObj` is determined by the overloaded new I > use: > > > > - `new X;` // RA > > - `new (mtTest) X;` // C-Heap > > - `new (arena) X;` // lives in Arena > > > > but this can be confusing, especially for newcomers. The default of RA > is surprising, in `ResourceObj` it was right in the name. Since RA > allocation is a bit dangerous, I think RA as default can trip over people. > > I think this RA default allocation is something we want to fix further > by making AnyObj not be an allocation class but these objects use a > factory. For this change, we're just separating AnyObj out from true > ResourceObjs. We have further work that we will do. > > > > I also dislike the MEMFLAGS==CHeap association. MEMFLAGS is semantically > different from where the object lives and this may bite us later. > > > > For example, today we don't require MEMFLAG for RA allocation because > the arena is tagged with a single flag. But we could easily switch to > per-allocation tracking instead by allowing an arena to carry multiple > MEMFLAGS. It would have some advantages. > > Having arena and ResourceArea tracking have multiple MEMFLAGS would > increase space and overhead though. ResourceObjs/ArenaObjs are supposed > to be minimally invasive. We'd have to talk about a change like that. > Sure. But note that the overhead would be *per Arena*, not per Object. Per Object would be terrible :) You'd have a vector of counters per arena where today you have a single counter. Also, only if NMT is on of course. We are talking about dozens, at most hundreds of bytes per physical Thread (per RA) if NMT is on. What we gain are complexity reduction in NMT and increased tracking resolution. RAs accumulate a lot of allocations from different use spaces. But we track all this as just "RA". Or, in compiler threads, as "compiler" (there is this weird hack that basically re-accounts RAs for compiler threads to "compiler" - we could get rid of all these ugly hacks). > > > > Therefore I would require the user to hand in the allocation type when > calling new AnyObj. It would be clearer, and since we don't want to have > many of these classes anyway, I think that would be okay. > > Agree that we don't want to have many of these classes anyway. This > AnyObj patch cuts down the number of classes we're looking at, which is > why I like it so much. We're not done though, and your ideas about its > replacement are welcome. > > Thanks for these explanations! It is unfortunate that outside Oracle one is often out of the loop, but this patch makes a lot of sense to me and is a nice complexity reduction. Cheers, Thomas Thanks, > Coleen > > > > src/hotspot/share/asm/codeBuffer.hpp line 386: > > > >> 384: // CodeBuffers must be allocated on the stack except for a single > >> 385: // special case during expansion which is handled internally. > This > >> 386: // is done to guarantee proper cleanup of resources. > > Not your patch, but comment is misleading. Probably should say "on the > ResourceArea". > > > > ------------- > > > > PR: https://git.openjdk.org/jdk/pull/10745 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Thu Oct 20 16:00:47 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Oct 2022 16:00:47 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Tue, 18 Oct 2022 13:42:40 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix Shenandoah This is a nice patch, and I like it, especially in the light of Coleen's explanation (Skara bot does not seem to reproduce those to Github. Shit it! ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/10745 From aph at openjdk.org Thu Oct 20 17:43:49 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 20 Oct 2022 17:43:49 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic So here's a thought: Reading and writing floating-point control flags can be expensive because they're serializing operations. However, we can discover whether the processor has been put into an "odd" rounding mode with just a few floating-point instructions, so we can do: if (epsilon + epsilon == 0) { rtn = fesetenv(&default_fenv) } which ends up as movsd xmm1,QWORD PTR [epsilon] pxor xmm4,xmm4 addsd xmm1,xmm1 ucomisd xmm0,xmm4 jnp ... We'd need a bit more fiddling to detect changes of rounding mode as well, if we wanted to do that. That might well be chap enough that we could do it on JNI calls. Do you think it would be worth my while doing some timings? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From joe.darcy at oracle.com Thu Oct 20 20:05:52 2022 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Thu, 20 Oct 2022 13:05:52 -0700 Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <7ae4f3c8-2ad9-e670-c7f6-85e1f24c2d8a@oracle.com> On 10/20/2022 10:43 AM, Andrew Haley wrote: > On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: > >>> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >>> >>> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >>> >>> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >>> >>> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic > So here's a thought: > > Reading and writing floating-point control flags can be expensive because they're serializing operations. However, we can discover whether the processor has been put into an "odd" rounding mode with just a few floating-point instructions, so we can do: > > > if (epsilon + epsilon == 0) { > rtn = fesetenv(&default_fenv) > } > Assuming the rounding mode could be one of the four classic rounding modes (to nearest even, to +infinity, to -infinity, to zero), two calculations are needed to determine if the mode is set to nearest even, as required by the JVM. Candidate calculations are 1.0 + ROUND_THRESH == 1.0 -1.0 - ROUND_THRESH == -1.0 with the decoding false false??? => to nearest false true??? => to positive infinity true false??? => to negative infinity true true ??? => to zero For double, the double rounding threshold is 2^?53 + 2^?105 ? 1.1102230246251568e ?16. An analogous constant can be derived for float, if desired HTH, -Joe From joe.darcy at oracle.com Thu Oct 20 20:18:13 2022 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Thu, 20 Oct 2022 13:18:13 -0700 Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <7ae4f3c8-2ad9-e670-c7f6-85e1f24c2d8a@oracle.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <7ae4f3c8-2ad9-e670-c7f6-85e1f24c2d8a@oracle.com> Message-ID: <571980c0-7c79-b0d1-f527-6862b6dfa0e8@oracle.com> PS And additional expressions could be crashed to rule out flush-to-zero, treating all subnormals and zero, and other non-compliant modes of operation. -Joe On 10/20/2022 1:05 PM, Joseph D. Darcy wrote: > > On 10/20/2022 10:43 AM, Andrew Haley wrote: >> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> >>>> A bug in GCC causes shared libraries linked with -ffast-math to >>>> disable denormal arithmetic. This breaks Java's floating-point >>>> semantics. >>>> >>>> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >>>> >>>> One solution is to save and restore the floating-point control word >>>> around System.loadLibrary(). This isn't perfect, because some >>>> shared library might load another shared library at runtime, but >>>> it's a lot better than what we do now. >>>> >>>> However, this fix is not complete. `dlopen()` is called from many >>>> places in the JDK. I guess the best thing to do is find and wrap >>>> them all. I'd like to hear people's opinions. >>> Andrew Haley has updated the pull request incrementally with one >>> additional commit since the last revision: >>> >>> ?? 8295159: DSO created with -ffast-math breaks Java floating-point >>> arithmetic >> So here's a thought: >> >> Reading and writing floating-point control flags can be expensive >> because they're serializing operations. However, we can discover >> whether the processor has been put into an "odd" rounding mode with >> just a few floating-point instructions, so we can do: >> >> >> if (epsilon + epsilon == 0) { >> ?????? rtn = fesetenv(&default_fenv) >> ???? } >> > Assuming the rounding mode could be one of the four classic rounding > modes (to nearest even, to +infinity, to -infinity, to zero), two > calculations are needed to determine if the mode is set to nearest > even, as required by the JVM. > > Candidate calculations are > > 1.0 + ROUND_THRESH == 1.0 > -1.0 - ROUND_THRESH == -1.0 > > with the decoding > > false false??? => to nearest > false true??? => to positive infinity > true false??? => to negative infinity > true true ??? => to zero > > For double, the double rounding threshold is 2^?53 + 2^?105 ? > 1.1102230246251568e ?16. An analogous constant can be derived for > float, if desired > > HTH, > > -Joe > From kbarrett at openjdk.org Thu Oct 20 20:29:59 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Oct 2022 20:29:59 GMT Subject: RFR: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal [v3] In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 13:24:19 GMT, Stefan Johansson wrote: >> Kim Barrett has updated the pull request incrementally with three additional commits since the last revision: >> >> - wanted vs needed nomenclature >> - remove several spurious "scan" >> - delay => wait_time_ms > > Marked as reviewed by sjohanss (Reviewer). Thanks @kstefanj , @tschatzl , @albertnetymk , @walulyai for reviews and additional performance testing. ------------- PR: https://git.openjdk.org/jdk/pull/10256 From vlivanov at openjdk.org Thu Oct 20 20:30:13 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 20 Oct 2022 20:30:13 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic That sounds like a very interesting idea. It would be very helpful to get an understanding how much overhead `STMXCSR` plus a branch adds in JNI stub to decide whether it's worth optimizing for. Call stub already employs an optimization to save on writing to MXCSR: Label skip_ldmx; __ stmxcsr(mxcsr_save); __ movl(rax, mxcsr_save); __ andl(rax, 0xFFC0); // Mask out any pending exceptions (only check control and mask bits) ExternalAddress mxcsr_std(StubRoutines::x86::addr_mxcsr_std()); __ cmp32(rax, mxcsr_std, rscratch1); __ jcc(Assembler::equal, skip_ldmx); __ ldmxcsr(mxcsr_std, rscratch1); __ bind(skip_ldmx); According to [uops.info](https://uops.info/html-instr/STMXCSR_M32.html), latencies for `STMXCSR` vary from 7-12 cycles on Intel to up to 20 on AMD. I haven't found any details about the actual implementations in silicon (can't confirm it serializes the execution), so I'm curious how much branch prediction can hide the latency in this particular case. If it turns out to be worth optimizing `STMXCSR` away, I see other problematic cases: StubRoutines::x86::_mxcsr_std = 0x1F80; // MXCSR.b 10987654321098765432109876543210 // 0xFFC0 00000000000000001111111111000000 // mask // 0x1F80 00000000000000000001111110000000 // MXCSR value used by JVM // 0x8040 00000000000000001000000001000000 // the bits -ffast-math mode unconditionally sets // MXCSR bits: // 15 FTZ Flush to Zero 0 = 1 // 14:13 RC Rounding Control 00 // 12 PM Precision Exception Mask 1 // 11 UM Underflow Exception Mask 1 // 10 OM Overflow Exception Mask 1 // 9 ZM Zero-Divide Exception Mask 1 // 8 DM Denormalized-Operand Exception Mask 1 // 7 IM Invalid-Operation Exception Mask 1 // 6 DAZ Denormals Are Zeros 0 = 1 The GCC bugs with `-ffast-math` only corrupts `FTZ` and `DAZ`. But `RC` and exception masks may be corrupted as well the same way and I believe the consequences are be similar (silent divergence in results during FP computations). ------------- PR: https://git.openjdk.org/jdk/pull/10661 From iwalulya at openjdk.org Thu Oct 20 20:30:52 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 20 Oct 2022 20:30:52 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v2] In-Reply-To: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: > Hi, > > Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. > > Usecase is in parallelizing the merging of large remsets for G1. > > Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). > > Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. > This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). > > This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. > > Testing: tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: thomas review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10759/files - new: https://git.openjdk.org/jdk/pull/10759/files/b06a4e2c..0e3e0356 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=00-01 Stats: 154 lines in 2 files changed: 68 ins; 66 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/10759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10759/head:pull/10759 PR: https://git.openjdk.org/jdk/pull/10759 From kbarrett at openjdk.org Thu Oct 20 20:33:14 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 20 Oct 2022 20:33:14 GMT Subject: Integrated: 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 00:36:18 GMT, Kim Barrett wrote: > 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal > 8155996: Improve concurrent refinement green zone control > 8134303: Introduce -XX:-G1UseConcRefinement > > Please review this change to the control of concurrent refinement. > > This new controller takes a different approach to the problem, addressing a > number of issues. > > The old controller used a multiple of the target number of cards to determine > the range over which increasing numbers of refinement threads should be > activated, and finally activating mutator refinement. This has a variety of > problems. It doesn't account for the processing rate, the rate of new dirty > cards, or the time available to perform the processing. This often leads to > unnecessary spikes in the number of running refinement threads. It also tends > to drive the pending number to the target quickly and keep it there, removing > the benefit from having pending dirty cards filter out new cards for nearby > writes. It can't delay and leave excess cards in the queue because it could > be a long time before another buffer is enqueued. > > The old controller was triggered by mutator threads enqueing card buffers, > when the number of cards in the queue exceeded a threshold near the target. > This required a complex activation protocol between the mutators and the > refinement threads. > > With the new controller there is a primary refinement thread that periodically > estimates how many refinement threads need to be running to reach the target > in time for the next GC, along with whether to also activate mutator > refinement. If the primary thread stops running because it isn't currently > needed, it sleeps for a period and reevaluates on wakeup. This eliminates any > involvement in the activation of refinement threads by mutator threads. > > The estimate of how many refinement threads are needed uses a prediction of > time until the next GC, the number of buffered cards, the predicted rate of > new dirty cards, and the predicted refinement rate. The number of running > threads is adjusted based on these periodically performed estimates. > > This new approach allows more dirty cards to be left in the queue until late > in the mutator phase, typically reducing the rate of new dirty cards, which > reduces the amount of concurrent refinement work needed. > > It also smooths out the number of running refinement threads, eliminating the > unnecessarily large spikes that are common with the old method. One benefit > is that the number of refinement threads (lazily) allocated is often much > lower now. (This plus UseDynamicNumberOfGCThreads mitigates the problem > described in JDK-8153225.) > > This change also provides a new method for calculating for the number of dirty > cards that should be pending at the start of a GC. While this calculation is > conceptually distinct from the thread control, the two were significanly > intertwined in the old controller. Changing this calculation separately and > first would have changed the behavior of the old controller in ways that might > have introduced regressions. Changing it after the thread control was changed > would have made it more difficult to test and measure the thread control in a > desirable configuration. > > The old calculation had various problems that are described in JDK-8155996. > In particular, it can get more or less stuck at low values, and is slow to > respond to changes. > > The old controller provided a number of product options, none of which were > very useful for real applications, and none of which are very applicable to > the new controller. All of these are being obsoleted. > > -XX:-G1UseAdaptiveConcRefinement > -XX:G1ConcRefinementGreenZone= > -XX:G1ConcRefinementYellowZone= > -XX:G1ConcRefinementRedZone= > -XX:G1ConcRefinementThresholdStep= > > The new controller *could* use G1ConcRefinementGreenZone to provide a fixed > value for the target number of cards, though it is poorly named for that. > > A configuration that was useful for some kinds of debugging and testing was to > disable G1UseAdaptiveConcRefinement and set g1ConcRefinementGreenZone to a > very large value, effectively disabling concurrent refinement. To support > this use case with the new controller, the -XX:-G1UseConcRefinement diagnostic > option has been added (see JDK-8155996). > > The other options are meaningless for the new controller. > > Because of these option changes, a CSR and a release note need to accompany > this change. > > Testing: > mach5 tier1-6 > various performance tests. > local (linux-x64) tier1 with -XX:-G1UseConcRefinement > > Performance testing found no regressions, but also little or no improvement > with default options, which was expected. With default options most of our > performance tests do very little concurrent refinement. And even for those > that do, while the old controller had a number of problems, the impact of > those problems is small and hard to measure for most applications. > > When reducing G1RSetUpdatingPauseTimePercent the new controller seems to fare > better, particularly when also reducing MaxGCPauseMillis. specjbb2015 with > MaxGCPauseMillis=75 and G1RSetUpdatingPauseTimePercent=3 (and other options > held constant) showed a statistically significant improvement of about 4.5% > for critical-jOPS. Using the changed controller, the difference between this > configuration and the default is fairly small, while the baseline shows > significant degradation with the more restrictive options. > > For all tests and configurations the new controller often creates many fewer > refinement threads. This pull request has now been integrated. Changeset: 028e8b3d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/028e8b3d5e7e1791a9ed0af244f74d21fb12ba81 Stats: 1613 lines in 24 files changed: 670 ins; 664 del; 279 mod 8137022: Concurrent refinement thread adjustment and (de-)activation suboptimal 8155996: Improve concurrent refinement green zone control 8134303: Introduce -XX:-G1UseConcRefinement Reviewed-by: sjohanss, tschatzl, iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/10256 From iwalulya at openjdk.org Thu Oct 20 20:38:02 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 20 Oct 2022 20:38:02 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v3] In-Reply-To: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: > Hi, > > Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. > > Usecase is in parallelizing the merging of large remsets for G1. > > Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). > > Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. > This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). > > This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. > > Testing: tier 1-3 Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: - move BucketsClaimer declaration - move BucketsClaimer declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10759/files - new: https://git.openjdk.org/jdk/pull/10759/files/0e3e0356..b6fe308e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=01-02 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10759/head:pull/10759 PR: https://git.openjdk.org/jdk/pull/10759 From ccheung at openjdk.org Thu Oct 20 22:53:18 2022 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 20 Oct 2022 22:53:18 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v4] In-Reply-To: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> References: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> Message-ID: <97_BDeGvocUwYzhMf7-p7FDhONymbbuDpyKp_1Km0dU=.64f85c5a-2945-4899-b476-c52866f12b98@github.com> On Wed, 19 Oct 2022 18:39:25 GMT, Ioi Lam wrote: >> Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: >> >> - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. >> - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. >> >> By doing the resolution at dump time, we can speed up run time start-up by a little bit. >> >> The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - @coleenp comments: changed to AllStatic > - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime > - fixed product build > - @coleenp comments > - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime > - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time Looks good. Just one nit in constantPool.cpp. src/hotspot/share/oops/constantPool.cpp line 376: > 374: set_resolved_references(OopHandle()); > 375: > 376: bool archived = false; I think this declaration could be moved to line 392 since it is only used in that case. ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.org/jdk/pull/10330 From dean.long at oracle.com Fri Oct 21 00:22:58 2022 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 20 Oct 2022 17:22:58 -0700 Subject: Biased locking Obsoletion In-Reply-To: <9bb7eb24-48a7-f60c-0302-b53bd60764db@redhat.com> References: <7ca651a8-5861-92cf-5d31-6a7fd09700c6@oracle.com> <8ecc52b9-132c-ebde-5759-b1829caaacc5@redhat.com> <4c6d7418-1e99-4f31-af66-3dc949cc0602@redhat.com> <11e412f4-12fb-2200-61d8-b8acc5f52b1c@oracle.com> <13a6581e-24c1-9423-1517-4adfb784eed4@redhat.com> <9bb7eb24-48a7-f60c-0302-b53bd60764db@redhat.com> Message-ID: Does wrapping the while loop with synchronized (newContent) { synchronized (oldContent) { ... } } allow lock coarsening to work in this case? dl On 10/20/22 6:00 AM, Andrew Dinn wrote: > On 20/10/2022 13:01, Alan Bateman wrote: >> A general point is that you are more likely to see this with the >> java.io APIs because they date from early JDK releases where they was >> a tenancy to synchronize everything. The synchronization is not >> specified and for most of these classes it just doesn't make sense to >> have several threads reading from a stream. However, there is 25+ >> years of usage, and there is concurrency if there is async close, so >> it would not be easy to just remove it. The DataOutputStream change >> that you linked to was okay because the class wasn't thread safe >> already and part of the change was to add a disclaimer on thread >> safety to the javadoc. > > Yes, I realise this is a very unfortunate legacy issue and advised our > customer of that. > >> I don't know know which JDK release was used for the test but BIS >> changed in JDK 19 to use j.u.concurrent locks when a BIS is >> constructed directly. It may be that it could be changed further to >> use a stamped lock but it would it would requires exclusive access. >> Just mentioning this because I think lock coarsening is monitors only >> (BIS does use monitors when subclassing as existing code may assume >> synchronization in the super class). > Yes, understood -- indeed, Aleksey Shipilev just explained that to me > in a separate, off-list email exchange. I also just noticed that the > cited example is tricky to handle even on a JVM that precedes the > switch to j.u.c locks because execution involves repeated, alternate > synchronization on two different BIS instances. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > From haosun at openjdk.org Fri Oct 21 01:40:51 2022 From: haosun at openjdk.org (Hao Sun) Date: Fri, 21 Oct 2022 01:40:51 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: References: Message-ID: <8_JXEy0H5byVGE6_77RkrTD-KJ-AXBfm2_E7hI46oMY=.5d2d9a53-3f94-4b41-aaa8-1f9ddd32f9e9@github.com> On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. LGTM (I'm not a Reviewer). ------------- Marked as reviewed by haosun (Author). PR: https://git.openjdk.org/jdk/pull/10783 From fyang at openjdk.org Fri Oct 21 02:33:24 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Oct 2022 02:33:24 GMT Subject: RFR: 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler In-Reply-To: References: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> Message-ID: On Thu, 20 Oct 2022 07:24:42 GMT, Feilong Jiang wrote: >> This is similar to: https://bugs.openjdk.org/browse/JDK-8295257 >> >> Remove implicit `= noreg` temporary register arguments for the three methods that still have them. >> * `load_heap_oop` >> * `store_heap_oop` >> * `load_heap_oop_not_null` >> >> Only `load_heap_oop` is used with the implicit `= noreg` arguments. >> After [JDK-8293769](https://bugs.openjdk.org/browse/JDK-8293769), the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. >> >> Testing: Tier1 hotspot with fastdebug build on HiFive Unmatched board. > > lgtm @feilongjiang @shipilev : Thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/10778 From fyang at openjdk.org Fri Oct 21 02:33:26 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Oct 2022 02:33:26 GMT Subject: Integrated: 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler In-Reply-To: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> References: <5eu0MwoZcdf1LjkeHllGGDGX2kR6B0RdYBk3TEsDg0s=.49814e3c-8292-48d4-a17b-4808b80b372c@github.com> Message-ID: On Thu, 20 Oct 2022 03:12:31 GMT, Fei Yang wrote: > This is similar to: https://bugs.openjdk.org/browse/JDK-8295257 > > Remove implicit `= noreg` temporary register arguments for the three methods that still have them. > * `load_heap_oop` > * `store_heap_oop` > * `load_heap_oop_not_null` > > Only `load_heap_oop` is used with the implicit `= noreg` arguments. > After [JDK-8293769](https://bugs.openjdk.org/browse/JDK-8293769), the GCs only use explicitly passed in registers. This will also be the case for generational ZGC. Where it currently requires `load_heap_oop` to provide a second temporary register. > > Testing: Tier1 hotspot with fastdebug build on HiFive Unmatched board. This pull request has now been integrated. Changeset: ef62b614 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/ef62b614d1760d198dcb7f5f0794fc3dc55587a7 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod 8295703: RISC-V: Remove implicit noreg temp register arguments in MacroAssembler Reviewed-by: shade, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/10778 From dholmes at openjdk.org Fri Oct 21 03:22:48 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 21 Oct 2022 03:22:48 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Thu, 20 Oct 2022 15:57:15 GMT, Thomas Stuefe wrote: > Skara bot does not seem to reproduce those to Github. Ship it! @tstuefe I've reported that issue to skara folk. The bots also don't resend emails when typos are fixed ;-) :) ------------- PR: https://git.openjdk.org/jdk/pull/10745 From stuefe at openjdk.org Fri Oct 21 05:46:48 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 21 Oct 2022 05:46:48 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v2] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Fri, 21 Oct 2022 03:20:35 GMT, David Holmes wrote: > > Skara bot does not seem to reproduce those to Github. Ship it! > > @tstuefe I've reported that issue to skara folk. The bots also don't resend emails when typos are fixed ;-) :) I hoped nobody would notice :) ------------- PR: https://git.openjdk.org/jdk/pull/10745 From sspitsyn at openjdk.org Fri Oct 21 06:14:59 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 21 Oct 2022 06:14:59 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 19:00:11 GMT, Leonid Mesnik wrote: > The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. A question. I do not see the following tests deleted while they are deleted from the `TEST.qick-groups` : - vmTestbase/nsk/jvmti/RunAgentThread/agentthr001/TestDescription.java \ - vmTestbase/nsk/jvmti/RunAgentThread/agentthr001/TestDescription.java \ - vmTestbase/nsk/jvmti/ThreadEnd/threadend001/TestDescription.java \ - vmTestbase/nsk/jvmti/ThreadEnd/threadend002/TestDescription.java \ ------------- PR: https://git.openjdk.org/jdk/pull/10665 From tschatzl at openjdk.org Fri Oct 21 06:46:59 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Oct 2022 06:46:59 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v3] In-Reply-To: References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Thu, 20 Oct 2022 20:38:02 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. >> >> Usecase is in parallelizing the merging of large remsets for G1. >> >> Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). >> >> Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. >> This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). >> >> This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. >> >> Testing: tier 1-3 > > Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: > > - move BucketsClaimer declaration > - move BucketsClaimer declaration Changes requested by tschatzl (Reviewer). src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1358: > 1356: } > 1357: return false; > 1358: } Minor nit: A better place for this method is probably `InternalTableClaimer`; then you can also probably improve encapsulation (visibility) of the members if you want, but I'm good with keeping everything public (as default for a struct) for this helper class. I did not think through the suggestion earlier. ------------- PR: https://git.openjdk.org/jdk/pull/10759 From tschatzl at openjdk.org Fri Oct 21 06:50:48 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Oct 2022 06:50:48 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: References: Message-ID: On Thu, 20 Oct 2022 08:28:09 GMT, Fei Yang wrote: > This is a trivial change renaming a formal parameter for ZBarrierSetAssembler::load_at. > > On AArch64 and RISC-V, the last formal parameter for ZBarrierSetAssembler::load_at > is named "tmp_thread". But the callers will pass an ordinary temporary register for > this parameter which has no relation with the thread register. We should rename this > formal parameter from "tmp_thread" to "tmp2". > > Testing: fastdebug builds on linux-aarch64 & linux-riscv64. Please also fix the `tmp_thread` parameter for x86 while you are at it ;) ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/10783 From dlong at openjdk.org Fri Oct 21 07:01:52 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Oct 2022 07:01:52 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <9PStlNrNfDolJIAQrhWBLXjFGTsRhY0TVZB4YWHwQeI=.56aedabb-f9cc-4362-955b-2aa1367bd026@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> <9PStlNrNfDolJIAQrhWBLXjFGTsRhY0TVZB4YWHwQeI=.56aedabb-f9cc-4362-955b-2aa1367bd026@github.com> Message-ID: On Thu, 20 Oct 2022 07:35:14 GMT, Erik ?sterlund wrote: >> This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. > > src/hotspot/share/runtime/sharedRuntime.cpp line 2119: > >> 2117: >> 2118: CodeBlob* cb = CodeCache::find_blob(caller_pc); >> 2119: if (cb == NULL || !cb->is_compiled() || callee->is_unloading()) { > > Why not move the is_unloading check on callee to the if statement just above that checks the callee (as opposed to the callsite)? I guess I was thinking is_unloading() can be a bit expensive the first time it is called, so it might be better to fail for other reasons first. But I believe is_unloading will eventually be called for every nmethod each unloading cycle, so avoiding the cost here just means moving it to somewhere else. I can move it to where you suggest if you like. ------------- PR: https://git.openjdk.org/jdk/pull/10747 From dlong at openjdk.org Fri Oct 21 07:07:52 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Oct 2022 07:07:52 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <9PStlNrNfDolJIAQrhWBLXjFGTsRhY0TVZB4YWHwQeI=.56aedabb-f9cc-4362-955b-2aa1367bd026@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> <9PStlNrNfDolJIAQrhWBLXjFGTsRhY0TVZB4YWHwQeI=.56aedabb-f9cc-4362-955b-2aa1367bd026@github.com> Message-ID: On Thu, 20 Oct 2022 07:38:00 GMT, Erik ?sterlund wrote: >> This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. > > src/hotspot/share/runtime/sharedRuntime.cpp line 2113: > >> 2111: NoSafepointVerifier nsv; >> 2112: >> 2113: CompiledMethod* callee = moop->code(); > > There is a moop->code() null check just a few lines below, so now it looks like we are reading the code pointer twice checking if it is null. Is ot enough to do that one time? It's actually the same number of null checks as before, if you look at what from_compiled_entry_no_trampoline() used to do. But I did consider removing the 2nd check, because no matter how late we check, we can always lose the race where it becomes null right after our last check. It's harmless however, so I decided to keep it. ------------- PR: https://git.openjdk.org/jdk/pull/10747 From fyang at openjdk.org Fri Oct 21 07:10:24 2022 From: fyang at openjdk.org (Fei Yang) Date: Fri, 21 Oct 2022 07:10:24 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 06:48:22 GMT, Thomas Schatzl wrote: > Please also fix the `tmp_thread` parameter for x86 while you are at it ;) It looks to me that the case for x86 is different here. For x86_32, this formal parameter will be used later for calling 'get_thread()' [1][2]. So I think we should keep its original name. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L573 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L573 ------------- PR: https://git.openjdk.org/jdk/pull/10783 From egahlin at openjdk.org Fri Oct 21 08:18:59 2022 From: egahlin at openjdk.org (Erik Gahlin) Date: Fri, 21 Oct 2022 08:18:59 GMT Subject: Integrated: 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing In-Reply-To: References: Message-ID: <0DO8aULNCN37qMbqUbnbr7mQItLdYRfKDj22BtU43hc=.ca5cfd6b-5f28-427c-b2a6-f4e5317dfbbf@github.com> On Mon, 17 Oct 2022 09:25:57 GMT, Erik Gahlin wrote: > Could I have a review of a PR that ensures JFR can be used when only the jdk.jfr module is present in an image. > > The behavior is similar to how -javaagent adds the java.instrument module and "jcmd PID ManagementAgent.status" loads the jdk.management.agent module. > > TestJfrJavaBase.java is replaced with TestModularImage.java. The former test could not be used since the jdk.jfr module is now added to the module graph when -XX:StartFlightRecording is specified. > > Testing: tier1-3 + test/jdk/jdk/jfr > > Thanks > Erik This pull request has now been integrated. Changeset: a345df20 Author: Erik Gahlin URL: https://git.openjdk.org/jdk/commit/a345df20d0a85b90e6703fba5582cacc5ba38a6d Stats: 306 lines in 6 files changed: 232 ins; 73 del; 1 mod 8280131: jcmd reports "Module jdk.jfr not found." when "jdk.management.jfr" is missing Reviewed-by: mgronlun, alanb ------------- PR: https://git.openjdk.org/jdk/pull/10723 From luhenry at openjdk.org Fri Oct 21 08:19:51 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 08:19:51 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Thu, 20 Oct 2022 14:16:52 GMT, Fei Yang wrote: >> Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: >> >> - Explicit use of temp registers >> - fixup! Add -XX:CacheLineSize= to set cache line size > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 48: > >> 46: >> 47: if (!FLAG_IS_DEFAULT(CacheLineSize) && !is_power_of_2(CacheLineSize)) { >> 48: warning("CacheLineSize must be a power of 2"); > > TBH, I am worried about the case when user specified some inaccurate cache-line size here, especially when the specified value is bigger than the actual cache-line size. The currently implementation won't work in that case. We really need some way to determine the cache-line size at runtime to be safe. That was done to answer https://github.com/openjdk/jdk/pull/10718#discussion_r996410678. Ideally it would be detected at runtime. However, I don't know of any API in RISC-V or Linux to get that information. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From aph at openjdk.org Fri Oct 21 08:30:44 2022 From: aph at openjdk.org (Andrew Haley) Date: Fri, 21 Oct 2022 08:30:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <4xSGTUVOSKQtJT61dcmi3ORP987OgOT6dvjxWDZhsbg=.a5049926-7471-4b80-9b7a-1a6908298b6c@github.com> On Thu, 20 Oct 2022 20:26:47 GMT, Vladimir Ivanov wrote: > That sounds like a very interesting idea. > > It would be very helpful to get an understanding how much overhead `STMXCSR` plus a branch adds in JNI stub to decide whether it's worth optimizing for. It's not just Intel's implementation of x86, though. Apple M1 takes a big hit when writing the FPCR: It seems to me to wait for all instructions in progress to retire. Given that there are 600 entries in the M1 reorder buffer (!) that's a lot. Of course they could rename the FPCR like anything else, but I guess they don't. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From iwalulya at openjdk.org Fri Oct 21 09:06:22 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 21 Oct 2022 09:06:22 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v4] In-Reply-To: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: > Hi, > > Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. > > Usecase is in parallelizing the merging of large remsets for G1. > > Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). > > Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. > This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). > > This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. > > Testing: tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: make claim InternalTableClaimer method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10759/files - new: https://git.openjdk.org/jdk/pull/10759/files/b6fe308e..1d736e77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=02-03 Stats: 33 lines in 1 file changed: 14 ins; 14 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10759/head:pull/10759 PR: https://git.openjdk.org/jdk/pull/10759 From iwalulya at openjdk.org Fri Oct 21 09:06:28 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 21 Oct 2022 09:06:28 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v3] In-Reply-To: References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Fri, 21 Oct 2022 06:37:54 GMT, Thomas Schatzl wrote: >> Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: >> >> - move BucketsClaimer declaration >> - move BucketsClaimer declaration > > src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1358: > >> 1356: } >> 1357: return false; >> 1358: } > > Minor nit: A better place for this method is probably `InternalTableClaimer`; then you can also probably improve encapsulation (visibility) of the members if you want, but I'm good with keeping everything public (as default for a struct) for this helper class. > I did not think through the suggestion earlier. you are right, i have updated, not certain about the `_table_claimer` and `_new_table_claimer` names, but couldn't come up with better ------------- PR: https://git.openjdk.org/jdk/pull/10759 From luhenry at openjdk.org Fri Oct 21 09:07:06 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 09:07:06 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v7] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - review - Merge branch 'master' of github.com:openjdk/jdk into dev/ludovic/upstream-zicboz - Explicit use of temp registers - fixup! Add -XX:CacheLineSize= to set cache line size - fixup! Add -XX:CacheLineSize= to set cache line size - fixup! Add -XX:CacheLineSize= to set cache line size - fixup! Add -XX:CacheLineSize= to set cache line size - Add -XX:CacheLineSize= to set cache line size - Fix comment - Fix alignement - ... and 2 more: https://git.openjdk.org/jdk/compare/ef62b614...ae39b0c0 ------------- Changes: https://git.openjdk.org/jdk/pull/10718/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=06 Stats: 165 lines in 9 files changed: 149 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 09:07:06 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 09:07:06 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Thu, 20 Oct 2022 14:07:23 GMT, Fei Yang wrote: >> Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: >> >> - Explicit use of temp registers >> - fixup! Add -XX:CacheLineSize= to set cache line size > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4127: > >> 4125: sub(cnt, cnt, tmp2); >> 4126: add(tmp3, zr, zr); >> 4127: movptr(tmp3, initial_table_end); > > I think it will be more efficient if we make use of 'auipc' instruction here for this purpose. The current version which makes use of 'movptr' will emits 6 instructions. I'm using `la` which I've modified with `wrap_label`. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 09:10:02 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 09:10:02 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v8] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <26VcwcL6LMWHxRT_LrZJ8hznHRRKCfOvAnW6RjK6sJ4=.eb87e7ab-99ee-4b15-b286-2f0d28dd1691@github.com> > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Remove unused movptr(Label) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/ae39b0c0..965a0e0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From jsjolen at openjdk.org Fri Oct 21 09:11:20 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Oct 2022 09:11:20 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v3] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 13:44:14 GMT, Thomas Stuefe wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary -Xlog:disable > > src/hotspot/share/runtime/deoptimization.cpp line 299: > >> 297: deoptimized_objects = deoptimized_objects || relocked; >> 298: #ifndef PRODUCT >> 299: LogMessage(deoptimization) lm; > > Drive-by: Does this not already incur costs? Does LogMessage not contain an internal buffer it allocates? Sorry, I missed to respond to this: It does incur costs, but it's surrounded by `#ifndef PRODUCT`, so I don't think we mind. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Fri Oct 21 09:11:24 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Oct 2022 09:11:24 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v2] In-Reply-To: References: Message-ID: <9QvyYhiYYZfEJo4OZAKRyivkCCwwRnOLPWbnXcROS-o=.8bdfcc19-e61c-4b35-bc8f-8b9014012df5@github.com> On Thu, 20 Oct 2022 14:12:47 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/vframe.cpp line 684: >> >>> 682: } >>> 683: void vframe::print_on(outputStream* st) const { >>> 684: _fr.print_value_on(st,NULL); >> >> I think this was changed in response to Coleens' comment. While I agree we are getting rid of `WizardMode` this PR is about `PrintDeoptimizationDetails` and this change seems unrelated to that (unless nothing else calls `vframe::print`?). > > There's a couple other printing function that call vframe::print() so I think that should keep WizardMode on for now until the callers become UL converted. > I don't think the vframe::print_on() version should keep WizardMode since logging calls it with the verbose logging level (guessing this is true, please check!) >You don't need the WizardMode guard here and in print_on. UL is supposed replace WizardMode and Verbose so the correct fix would be to elide it from print_on and only log using print_on when the logging level is logically equivalent to WizardMode (as you have done elsewhere). It was actually in response to what you said :-). I'm OK with either way. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From jsjolen at openjdk.org Fri Oct 21 09:58:32 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Oct 2022 09:58:32 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream [v2] In-Reply-To: References: Message-ID: > Hi, > > I went through all of the places where LogStreams are created and removed the unnecessary ResourceMarks. I also added a ResourceMark in one place, where it was needed because of a call to `::name_and_sig_as_C_string` and moved one to the smallest scope where it is used. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Put back VM_Operation::evaluate ResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10602/files - new: https://git.openjdk.org/jdk/pull/10602/files/bfa88acb..ab939bf8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10602&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10602&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10602.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10602/head:pull/10602 PR: https://git.openjdk.org/jdk/pull/10602 From jsjolen at openjdk.org Fri Oct 21 09:58:33 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 21 Oct 2022 09:58:33 GMT Subject: RFR: 8294954: Remove superfluous ResourceMarks when using LogStream In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 11:19:55 GMT, Johan Sj?len wrote: > Hi, > > I went through all of the places where LogStreams are created and removed the unnecessary ResourceMarks. I also added a ResourceMark in one place, where it was needed because of a call to `::name_and_sig_as_C_string` and moved one to the smallest scope where it is used. I put back the ResourceMark in `VM_Operation::evaluate` as looking through each VM Operation for unprotected resource usage is infeasible. ------------- PR: https://git.openjdk.org/jdk/pull/10602 From thartmann at openjdk.org Fri Oct 21 10:00:48 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Oct 2022 10:00:48 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s I executed some quick testing and this fails with: [2022-10-21T09:54:28,696Z] # A fatal error has been detected by the Java Runtime Environment: [2022-10-21T09:54:28,696Z] # [2022-10-21T09:54:28,696Z] # Internal Error (/opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S8380/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/5903b026-cdbd-4aa4-8433-6a45fb7ee593/runs/f75b29aa-40ef-46a5-b323-3a80aaa9aa6b/workspace/open/src/hotspot/cpu/x86/assembler_x86.cpp:5358), pid=2385300, tid=2385302 [2022-10-21T09:54:28,696Z] # Error: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : vector_len == AVX_256bit ? VM_Version::supports_avx2() : vector_len == AVX_512bit ? VM_Version::supports_avx512bw() : 0) failed [2022-10-21T09:54:28,696Z] # [2022-10-21T09:54:28,696Z] # JRE version: (20.0) (fastdebug build ) [2022-10-21T09:54:28,696Z] # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-internal-2022-10-21-0733397.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) [2022-10-21T09:54:28,696Z] # Problematic frame: [2022-10-21T09:54:28,696Z] # V [libjvm.so+0x6e3bf0] Assembler::vpslldq(XMMRegister, XMMRegister, int, int)+0x190 ------------- PR: https://git.openjdk.org/jdk/pull/10582 From stefank at openjdk.org Fri Oct 21 10:25:03 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 21 Oct 2022 10:25:03 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: > Background to this patch: > > This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. > > PR RFC: > > HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: > > MetaspaceObj - allocates in the Metaspace > CHeap - uses malloc > ResourceObj - ... > > The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. > > This is IMHO misleading, and often leads to confusion among HotSpot developers. > > I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. > > In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. > > The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. > > The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types - Work around gtest exception compilation issues - Fix Shenandoah - Remove AnyObj new operator taking an allocation_type - Use more specific allocation types ------------- Changes: https://git.openjdk.org/jdk/pull/10745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10745&range=02 Stats: 486 lines in 158 files changed: 82 ins; 45 del; 359 mod Patch: https://git.openjdk.org/jdk/pull/10745.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10745/head:pull/10745 PR: https://git.openjdk.org/jdk/pull/10745 From tschatzl at openjdk.org Fri Oct 21 11:41:51 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Oct 2022 11:41:51 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v4] In-Reply-To: References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Fri, 21 Oct 2022 09:06:22 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. >> >> Usecase is in parallelizing the merging of large remsets for G1. >> >> Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). >> >> Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. >> This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). >> >> This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. >> >> Testing: tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > make claim InternalTableClaimer method Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10759 From vkempik at openjdk.org Fri Oct 21 12:07:49 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 21 Oct 2022 12:07:49 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v8] In-Reply-To: <26VcwcL6LMWHxRT_LrZJ8hznHRRKCfOvAnW6RjK6sJ4=.eb87e7ab-99ee-4b15-b286-2f0d28dd1691@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <26VcwcL6LMWHxRT_LrZJ8hznHRRKCfOvAnW6RjK6sJ4=.eb87e7ab-99ee-4b15-b286-2f0d28dd1691@github.com> Message-ID: <_n8VamAGT5xw9etEA5MsXSieAr5MS1SZjrKvmHiC8mY=.eed14ab6-8752-440b-9584-5771dbe00009@github.com> On Fri, 21 Oct 2022 09:10:02 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused movptr(Label) I think in some near feature well need to group all of this "UseZicbom,UseZicbop,UzeZba, etc.." flags under one umbrella flag - .e.g UseAppProfile22, and cache line size would be a part of it too. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From vkempik at openjdk.org Fri Oct 21 12:13:54 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 21 Oct 2022 12:13:54 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Fri, 21 Oct 2022 08:17:32 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 48: >> >>> 46: >>> 47: if (!FLAG_IS_DEFAULT(CacheLineSize) && !is_power_of_2(CacheLineSize)) { >>> 48: warning("CacheLineSize must be a power of 2"); >> >> TBH, I am worried about the case when user specified some inaccurate cache-line size here, especially when the specified value is bigger than the actual cache-line size. The currently implementation won't work in that case. We really need some way to determine the cache-line size at runtime to be safe. > > That was done to answer https://github.com/openjdk/jdk/pull/10718#discussion_r996410678. Ideally it would be detected at runtime. However, I don't know of any API in RISC-V or Linux to get that information. This could be an answer to the issue - https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva22-profiles Zic64b Cache blocks must be 64 bytes in size, naturally aligned in the address space. The following mandatory feature was further restricted in RVA22U64: Note | While the general RISC-V specifications are agnostic to cache block size, selecting a common cache block size simplifies the specification and use of the following cache-block extensions within the application processor profile. Software does not have to query a discovery mechanism and/or provide dynamic dispatch to the appropriate code. We choose 64 bytes at it is effectively an industry standard. Implementations may use longer cache blocks to reduce tag cost provided they use 64-byte sub-blocks to remain compatible. Implementations may use shorter cache blocks provided they sequence cache operations across the multiple cache blocks comprising a 64-byte block to remain compatible. We can already create a flag: e.g. -XX:+UseRVA22U64BASE which will activate Zba, Zbb, Zicbom and set cache block size to 64-bytes ------------- PR: https://git.openjdk.org/jdk/pull/10718 From aboldtch at openjdk.org Fri Oct 21 13:21:49 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 21 Oct 2022 13:21:49 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Fri, 21 Oct 2022 10:25:03 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Work around gtest exception compilation issues > - Fix Shenandoah > - Remove AnyObj new operator taking an allocation_type > - Use more specific allocation types Seems like the riscv port uses virtual destructors on classes that inherits from ResourceObj. This requires the delete operator to be defined. [C++ Standard](https://eel.is/c++draft/class.dtor#16) It occurs in the Assembler, MacroAssembler and InterpreterMacroAssembler which all have empty virtual destructors. And in SignatureHandlerGenerator which NULLs an internal field. Any of the RISCV porters that know why virtual destructors are used in this way, and if they are necessary. I can compile `make CONF=riscv hotspot` with the virtual destructors removed. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From mneugschwand at openjdk.org Fri Oct 21 13:45:38 2022 From: mneugschwand at openjdk.org (Matthias Neugschwandtner) Date: Fri, 21 Oct 2022 13:45:38 GMT Subject: RFR: 8295776: [JVMCI] Add AMD64 CPU flags for MPK and CET Message-ID: Add the CPU flags for memory protection keys (MPK) and control flow enforcement technology (CET) to JVMCI such that Graal can make use of these technologies. ------------- Commit messages: - 8295776 - Add AMD64 CPU flags for MPK and CET Changes: https://git.openjdk.org/jdk/pull/10810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10810&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295776 Stats: 37 lines in 4 files changed: 34 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10810/head:pull/10810 PR: https://git.openjdk.org/jdk/pull/10810 From luhenry at openjdk.org Fri Oct 21 13:56:51 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 13:56:51 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <6JsKSivFGy0UZSZNs9KL_NNScIPoqNsPNfeJwc0CWAo=.0b3e8404-9e61-48ca-bf8d-19501c5222cf@github.com> On Fri, 21 Oct 2022 12:11:28 GMT, Vladimir Kempik wrote: >> That was done to answer https://github.com/openjdk/jdk/pull/10718#discussion_r996410678. Ideally it would be detected at runtime. However, I don't know of any API in RISC-V or Linux to get that information. > > This could be an answer to the issue - https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva22-profiles > > The following mandatory feature was further restricted in RVA22U64: > .... > Zic64b Cache blocks must be 64 bytes in size, naturally aligned in the address space. > > Note | While the general RISC-V specifications are agnostic to cache block size, selecting a common cache block size simplifies the specification and use of the following cache-block extensions within the application processor profile. Software does not have to query a discovery mechanism and/or provide dynamic dispatch to the appropriate code. We choose 64 bytes at it is effectively an industry standard. Implementations may use longer cache blocks to reduce tag cost provided they use 64-byte sub-blocks to remain compatible. Implementations may use shorter cache blocks provided they sequence cache operations across the multiple cache blocks comprising a 64-byte block to remain compatible. > > We can already create a flag: e.g. -XX:+UseRVA22U64BASE which will activate Zba, Zbb, Zicbom and set cache block size to 64-bytes I'll add a check like the following: if (!UseZic64b) { if (FLAG_IS_DEFAULT(UseZicboz)) { FLAG_SET_DEFAULT(UseZicboz, false); } } ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 14:52:13 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 14:52:13 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v9] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Make use of RVA22U64 profile and Zic64b extension to guide CacheLineSize value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/965a0e0d..52047b6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=07-08 Stats: 50 lines in 2 files changed: 45 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 14:52:13 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 14:52:13 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v8] In-Reply-To: <_n8VamAGT5xw9etEA5MsXSieAr5MS1SZjrKvmHiC8mY=.eed14ab6-8752-440b-9584-5771dbe00009@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <26VcwcL6LMWHxRT_LrZJ8hznHRRKCfOvAnW6RjK6sJ4=.eb87e7ab-99ee-4b15-b286-2f0d28dd1691@github.com> <_n8VamAGT5xw9etEA5MsXSieAr5MS1SZjrKvmHiC8mY=.eed14ab6-8752-440b-9584-5771dbe00009@github.com> Message-ID: On Fri, 21 Oct 2022 12:04:04 GMT, Vladimir Kempik wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused movptr(Label) > > I think in some near feature well need to group all of this "UseZicbom,UseZicbop,UzeZba, etc.." flags under one umbrella flag - .e.g UseAppProfile22, and cache line size would be a part of it too. @VladimirKempik I've added https://github.com/openjdk/jdk/pull/10718/files#diff-7b173d6e5834de13749c8333192fef5a874628a67b90a5d8d06235d507542ac4R39-R66 to do that. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 14:52:13 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 14:52:13 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v6] In-Reply-To: <6JsKSivFGy0UZSZNs9KL_NNScIPoqNsPNfeJwc0CWAo=.0b3e8404-9e61-48ca-bf8d-19501c5222cf@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <6JsKSivFGy0UZSZNs9KL_NNScIPoqNsPNfeJwc0CWAo=.0b3e8404-9e61-48ca-bf8d-19501c5222cf@github.com> Message-ID: <2Q4BrTHVCbUEAKQTkf-6dihxtrv_MyC1kNUErKwR35c=.56aeb147-038e-4129-907a-a04bacb80d8e@github.com> On Fri, 21 Oct 2022 13:54:33 GMT, Ludovic Henry wrote: >> This could be an answer to the issue - https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva22-profiles >> >> The following mandatory feature was further restricted in RVA22U64: >> .... >> Zic64b Cache blocks must be 64 bytes in size, naturally aligned in the address space. >> >> Note | While the general RISC-V specifications are agnostic to cache block size, selecting a common cache block size simplifies the specification and use of the following cache-block extensions within the application processor profile. Software does not have to query a discovery mechanism and/or provide dynamic dispatch to the appropriate code. We choose 64 bytes at it is effectively an industry standard. Implementations may use longer cache blocks to reduce tag cost provided they use 64-byte sub-blocks to remain compatible. Implementations may use shorter cache blocks provided they sequence cache operations across the multiple cache blocks comprising a 64-byte block to remain compatible. >> >> We can already create a flag: e.g. -XX:+UseRVA22U64BASE which will activate Zba, Zbb, Zicbom and set cache block size to 64-bytes > > I'll add a check like the following: > > if (!UseZic64b) { > if (FLAG_IS_DEFAULT(UseZicboz)) { > FLAG_SET_DEFAULT(UseZicboz, false); > } > } I've added https://github.com/openjdk/jdk/pull/10718/files#diff-7b173d6e5834de13749c8333192fef5a874628a67b90a5d8d06235d507542ac4R68-R79. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 14:57:14 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 14:57:14 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v10] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/52047b6a..4d91b312 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Fri Oct 21 15:32:02 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 21 Oct 2022 15:32:02 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: - Disable block zeroing in case CacheLineSize isn't the default value - Disable UseZicboz if CacheLineSize is set by user ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/4d91b312..e232c1b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From kvn at openjdk.org Fri Oct 21 16:21:47 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Oct 2022 16:21:47 GMT Subject: RFR: 8295776: [JVMCI] Add AMD64 CPU flags for MPK and CET In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 09:48:23 GMT, Matthias Neugschwandtner wrote: > Add the CPU flags for memory protection keys (MPK) and control flow enforcement technology (CET) to JVMCI such that Graal can make use of these technologies. Changes look fine. I approve them. The only confusion I have is platform name in title (yes, I know it is because Graal still uses AMD64 as platform name). May be better use x86 in title (I assume it is not 64-bit specific feature). I see these features are implemented in Intel's CPUs. Do AMD also have them? ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10810 From duke at openjdk.org Fri Oct 21 18:09:51 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 21 Oct 2022 18:09:51 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 09:57:14 GMT, Tobias Hartmann wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > I executed some quick testing and this fails with: > > > [2022-10-21T09:54:28,696Z] # A fatal error has been detected by the Java Runtime Environment: > [2022-10-21T09:54:28,696Z] # > [2022-10-21T09:54:28,696Z] # Internal Error (/opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S8380/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/5903b026-cdbd-4aa4-8433-6a45fb7ee593/runs/f75b29aa-40ef-46a5-b323-3a80aaa9aa6b/workspace/open/src/hotspot/cpu/x86/assembler_x86.cpp:5358), pid=2385300, tid=2385302 > [2022-10-21T09:54:28,696Z] # Error: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : vector_len == AVX_256bit ? VM_Version::supports_avx2() : vector_len == AVX_512bit ? VM_Version::supports_avx512bw() : 0) failed > [2022-10-21T09:54:28,696Z] # > [2022-10-21T09:54:28,696Z] # JRE version: (20.0) (fastdebug build ) > [2022-10-21T09:54:28,696Z] # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-internal-2022-10-21-0733397.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > [2022-10-21T09:54:28,696Z] # Problematic frame: > [2022-10-21T09:54:28,696Z] # V [libjvm.so+0x6e3bf0] Assembler::vpslldq(XMMRegister, XMMRegister, int, int)+0x190 Hi @TobiHartmann , thanks for looking. Could you share CPU Model and flags from `hs_err` please? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From kvn at openjdk.org Fri Oct 21 18:23:08 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Oct 2022 18:23:08 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s Test: jdk/incubator/vector/VectorMaxConversionTests.java#id1 Flags: `-ea -esa -XX:UseAVX=3 -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting -XX:+UseZGC` CPU: Intel 8358 (all AVX512 features). I think the problem is this subtest runs with ` -XX:+UseKNLSetting`[VectorMaxConversionTests.java#L50](https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorMaxConversionTests.java#L50) which limits AVX512 features. Call stack: V [libjvm.so+0x6e3bf0] Assembler::vpslldq(XMMRegister, XMMRegister, int, int)+0x190 (assembler_x86.cpp:5358) V [libjvm.so+0x152a23b] MacroAssembler::poly1305_process_blocks_avx512(Register, Register, Register, Register, Register, Register, Register, Register)+0xc7b (macroAssembler_x86_poly.cpp:590) V [libjvm.so+0x152c23d] MacroAssembler::poly1305_process_blocks(Register, Register, Register, Register)+0x3ad (macroAssembler_x86_poly.cpp:849) V [libjvm.so+0x192dc00] StubGenerator::generate_poly1305_processBlocks()+0x170 (stubGenerator_x86_64.cpp:2069) V [libjvm.so+0x1936a89] StubGenerator::generate_initial()+0x419 (stubGenerator_x86_64.cpp:3798) V [libjvm.so+0x1937b78] StubGenerator_generate(CodeBuffer*, int)+0xf8 (stubGenerator_x86_64.hpp:526) V [libjvm.so+0x198e695] StubRoutines::initialize1() [clone .part.0]+0x155 (stubRoutines.cpp:229) V [libjvm.so+0xfc4342] init_globals()+0x32 (init.cpp:123) V [libjvm.so+0x1a7268f] Threads::create_vm(JavaVMInitArgs*, bool*)+0x37f ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 21 18:39:38 2022 From: duke at openjdk.org (Matias Saavedra Silva) Date: Fri, 21 Oct 2022 18:39:38 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v11] In-Reply-To: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> References: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> Message-ID: <3pf4ZhDp-3EVyZ2U28L4m8HitjnKsLb2Q6RJGw58mQk=.b2ee8da0-37bb-4839-aa82-c89ea342506a@github.com> On Thu, 22 Sep 2022 18:03:44 GMT, Ioi Lam wrote: >> Current in gdb, you can print information about a class or method with something like >> >> >> call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) >> call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) >> >> >> However, it's difficult to find a class or method by its name and print out its contents. >> >> This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. >> >> - `findclass()`: class name only >> - `findmethod()`: class name and method name >> - `findmethod2()`: class name and method name/signature >> >> I also cleaned up `BytecodeTracer` to remove unnecessary complexity. >> >> Here are some examples: >> >> >> (gdb) call findclass("java/lang/Object", 0) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> >> (gdb) call findclass("java/lang/Object", 1) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> 0x00007fffb4000658 : ()V >> 0x00007fffb40010f0 finalize : ()V >> 0x00007fffb4000f00 wait0 : (J)V >> 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z >> 0x00007fffb4000aa0 toString : ()Ljava/lang/String; >> 0x00007fffb40007f0 hashCode : ()I >> 0x00007fffb4000720 getClass : ()Ljava/lang/Class; >> 0x00007fffb40009a0 clone : ()Ljava/lang/Object; >> 0x00007fffb4000b50 notify : ()V >> 0x00007fffb4000c20 notifyAll : ()V >> 0x00007fffb4000e50 wait : (J)V >> 0x00007fffb4001028 wait : (JI)V >> 0x00007fffb4000d08 wait : ()V >> >> (gdb) call findclass("*ClassLoader", 0) >> [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [....] >> >> (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> 0x00007fffb4000d08 wait : ()V >> 0x00007fffb4000ce8 0 fast_aload_0 >> 0x00007fffb4000ce9 1 lconst_0 >> 0x00007fffb4000cea 2 invokevirtual 38 >> 0x00007fffb4000ced 5 return > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @coleenp comments Overall, this looks good to me! I just have two comments: 1. findmethod2() could use a better name, maybe something along the lines of find_method_and_signature()? Not sure if these method calls need to be in camel case or snake case as well. 2. I noticed the use of continue statements in classFilePrinter: that code could probably be refactored into something that avoids the use of continue ------------- PR: https://git.openjdk.org/jdk/pull/9957 From coleenp at openjdk.org Fri Oct 21 19:08:12 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 21 Oct 2022 19:08:12 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v11] In-Reply-To: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> References: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> Message-ID: <4pVu3JYIKBDS9jMG47NPjqNmDCv_ZWX6h7CsAALE8So=.d64963a1-f126-4d9a-bdec-52f6b9ebfc17@github.com> On Thu, 22 Sep 2022 18:03:44 GMT, Ioi Lam wrote: >> Current in gdb, you can print information about a class or method with something like >> >> >> call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) >> call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) >> >> >> However, it's difficult to find a class or method by its name and print out its contents. >> >> This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. >> >> - `findclass()`: class name only >> - `findmethod()`: class name and method name >> - `findmethod2()`: class name and method name/signature >> >> I also cleaned up `BytecodeTracer` to remove unnecessary complexity. >> >> Here are some examples: >> >> >> (gdb) call findclass("java/lang/Object", 0) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> >> (gdb) call findclass("java/lang/Object", 1) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> 0x00007fffb4000658 : ()V >> 0x00007fffb40010f0 finalize : ()V >> 0x00007fffb4000f00 wait0 : (J)V >> 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z >> 0x00007fffb4000aa0 toString : ()Ljava/lang/String; >> 0x00007fffb40007f0 hashCode : ()I >> 0x00007fffb4000720 getClass : ()Ljava/lang/Class; >> 0x00007fffb40009a0 clone : ()Ljava/lang/Object; >> 0x00007fffb4000b50 notify : ()V >> 0x00007fffb4000c20 notifyAll : ()V >> 0x00007fffb4000e50 wait : (J)V >> 0x00007fffb4001028 wait : (JI)V >> 0x00007fffb4000d08 wait : ()V >> >> (gdb) call findclass("*ClassLoader", 0) >> [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [....] >> >> (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> 0x00007fffb4000d08 wait : ()V >> 0x00007fffb4000ce8 0 fast_aload_0 >> 0x00007fffb4000ce9 1 lconst_0 >> 0x00007fffb4000cea 2 invokevirtual 38 >> 0x00007fffb4000ced 5 return > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @coleenp comments This seems fine. I had a few comments, not many. src/hotspot/share/interpreter/bytecodeTracer.cpp line 171: > 169: void BytecodeTracer::trace_interpreter(const methodHandle& method, address bcp, uintptr_t tos, uintptr_t tos2, outputStream* st) { > 170: if (TraceBytecodes && BytecodeCounter::counter_value() >= TraceBytecodesAt) { > 171: ttyLocker ttyl; // 5065316: keep the following output coherent We should file a a RFE to make this a leaf mutex rather than ttyLocker (which does get broken to allow safepoints!) with no_safepoint_check assuming that the printing doesn't safepoint. Good to remove the no-longer accurate comment. src/hotspot/share/utilities/debug.cpp line 664: > 662: } > 663: > 664: extern "C" JNIEXPORT void printclass(intptr_t k, int flags) { It's weird that the function to print the method is findmethod, and the one to print the class is printclass. I see that findmethod is consistent with the 'find' functions above. Maybe this should be findclass() too. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/9957 From duke at openjdk.org Fri Oct 21 20:12:05 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 21 Oct 2022 20:12:05 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v2] In-Reply-To: References: Message-ID: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: Stash: fetch limbs directly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/7e070d9e..6a60c128 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=00-01 Stats: 61 lines in 3 files changed: 28 ins; 1 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 21 20:13:29 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 21 Oct 2022 20:13:29 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s (Apologies, ignore the `Stash: fetch limbs directly` commit.. got git commit command mixed up.. will force-push a fix to the crash in a sec) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 21 20:20:58 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 21 Oct 2022 20:20:58 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3] In-Reply-To: References: Message-ID: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s vpaprotsk has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: further restrict UsePolyIntrinsics with supports_avx512vlbw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/6a60c128..f048f938 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=01-02 Stats: 62 lines in 4 files changed: 1 ins; 28 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 21 20:28:56 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 21 Oct 2022 20:28:56 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 18:20:10 GMT, Vladimir Kozlov wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > Test: jdk/incubator/vector/VectorMaxConversionTests.java#id1 > Flags: `-ea -esa -XX:UseAVX=3 -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting -XX:+UseZGC` > CPU: Intel 8358 (all AVX512 features). > > I think the problem is this subtest runs with ` -XX:+UseKNLSetting`[VectorMaxConversionTests.java#L50](https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorMaxConversionTests.java#L50) which limits AVX512 features. > > Call stack: > > V [libjvm.so+0x6e3bf0] Assembler::vpslldq(XMMRegister, XMMRegister, int, int)+0x190 (assembler_x86.cpp:5358) > V [libjvm.so+0x152a23b] MacroAssembler::poly1305_process_blocks_avx512(Register, Register, Register, Register, Register, Register, Register, Register)+0xc7b (macroAssembler_x86_poly.cpp:590) > V [libjvm.so+0x152c23d] MacroAssembler::poly1305_process_blocks(Register, Register, Register, Register)+0x3ad (macroAssembler_x86_poly.cpp:849) > V [libjvm.so+0x192dc00] StubGenerator::generate_poly1305_processBlocks()+0x170 (stubGenerator_x86_64.cpp:2069) > V [libjvm.so+0x1936a89] StubGenerator::generate_initial()+0x419 (stubGenerator_x86_64.cpp:3798) > V [libjvm.so+0x1937b78] StubGenerator_generate(CodeBuffer*, int)+0xf8 (stubGenerator_x86_64.hpp:526) > V [libjvm.so+0x198e695] StubRoutines::initialize1() [clone .part.0]+0x155 (stubRoutines.cpp:229) > V [libjvm.so+0xfc4342] init_globals()+0x32 (init.cpp:123) > V [libjvm.so+0x1a7268f] Threads::create_vm(JavaVMInitArgs*, bool*)+0x37f Thanks @vnkozlov, was able to reproduce. @TobiHartmann, I added `supports_avx512vlbw` check to `UsePolyIntrinsics`. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From lmesnik at openjdk.org Fri Oct 21 21:39:23 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 21 Oct 2022 21:39:23 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. [v2] In-Reply-To: References: Message-ID: > The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10665/files - new: https://git.openjdk.org/jdk/pull/10665/files/fe44bf6f..35dbf0cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10665&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10665&range=00-01 Stats: 645 lines in 9 files changed: 2 ins; 643 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10665.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10665/head:pull/10665 PR: https://git.openjdk.org/jdk/pull/10665 From lmesnik at openjdk.org Fri Oct 21 21:39:26 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 21 Oct 2022 21:39:26 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 19:00:11 GMT, Leonid Mesnik wrote: > The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. Thank you, I deleted vmTestbase/nsk/jvmti/ThreadEnd which were ported and returned back RunAgentThread entries. ------------- PR: https://git.openjdk.org/jdk/pull/10665 From eosterlund at openjdk.org Fri Oct 21 22:22:20 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 21 Oct 2022 22:22:20 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10747 From eosterlund at openjdk.org Fri Oct 21 22:22:20 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 21 Oct 2022 22:22:20 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> <9PStlNrNfDolJIAQrhWBLXjFGTsRhY0TVZB4YWHwQeI=.56aedabb-f9cc-4362-955b-2aa1367bd026@github.com> Message-ID: On Fri, 21 Oct 2022 07:04:03 GMT, Dean Long wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 2113: >> >>> 2111: NoSafepointVerifier nsv; >>> 2112: >>> 2113: CompiledMethod* callee = moop->code(); >> >> There is a moop->code() null check just a few lines below, so now it looks like we are reading the code pointer twice checking if it is null. Is ot enough to do that one time? > > It's actually the same number of null checks as before, if you look at what from_compiled_entry_no_trampoline() used to do. But I did consider removing the 2nd check, because no matter how late we check, we can always lose the race where it becomes null right after our last check. It's harmless however, so I decided to keep it. Okay. That seems fine. >> src/hotspot/share/runtime/sharedRuntime.cpp line 2119: >> >>> 2117: >>> 2118: CodeBlob* cb = CodeCache::find_blob(caller_pc); >>> 2119: if (cb == NULL || !cb->is_compiled() || callee->is_unloading()) { >> >> Why not move the is_unloading check on callee to the if statement just above that checks the callee (as opposed to the callsite)? > > I guess I was thinking is_unloading() can be a bit expensive the first time it is called, so it might be better to fail for other reasons first. But I believe is_unloading will eventually be called for every nmethod each unloading cycle, so avoiding the cost here just means moving it to somewhere else. I can move it to where you suggest if you like. Okay I see. I'll leave it to you to decide if you prefer to move it or not. I'm okay with it either way. ------------- PR: https://git.openjdk.org/jdk/pull/10747 From kbarrett at openjdk.org Sat Oct 22 01:44:36 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 22 Oct 2022 01:44:36 GMT Subject: RFR: 8295808: GrowableArray should support capacity management Message-ID: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Please review this change to GrowableArray to support capacity management. Two functions are added to GrowableArray, reserve and shrink_to_fit. Also renamed the max_length function to capacity. Used these new functions in StringDedupTable. Testing: mach5 tier1-3 ------------- Commit messages: - copyrights - use reserve/shrink_to_fit in StringDedupTable - gtests for capacity functions - add reserve and shrink_to_fit - max_length() => capacity() - initial_capacity => capacity - capacity nomenclature Changes: https://git.openjdk.org/jdk/pull/10827/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10827&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295808 Stats: 195 lines in 8 files changed: 90 ins; 19 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/10827.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10827/head:pull/10827 PR: https://git.openjdk.org/jdk/pull/10827 From dlong at openjdk.org Sat Oct 22 02:15:10 2022 From: dlong at openjdk.org (Dean Long) Date: Sat, 22 Oct 2022 02:15:10 GMT Subject: RFR: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. Thanks Erik. ------------- PR: https://git.openjdk.org/jdk/pull/10747 From dlong at openjdk.org Sat Oct 22 02:15:11 2022 From: dlong at openjdk.org (Dean Long) Date: Sat, 22 Oct 2022 02:15:11 GMT Subject: Integrated: 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() In-Reply-To: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> References: <4gzYi89W2pU9jqHEaiDlrz_BV9_Lh9B7zoaYNcAKGdE=.03c924af-1a84-4fe0-adbc-b2ca96b2c24f@github.com> Message-ID: On Tue, 18 Oct 2022 17:37:30 GMT, Dean Long wrote: > This change adds a missing is_unloading() check for the callee in SharedRuntime::fixup_callers_callsite() and removes from_compiled_entry_no_trampoline() because it is no longer used. This pull request has now been integrated. Changeset: b5efa2af Author: Dean Long URL: https://git.openjdk.org/jdk/commit/b5efa2afe268e3171f54d8488ef69bf67059bd7f Stats: 22 lines in 3 files changed: 9 ins; 12 del; 1 mod 8294538: missing is_unloading() check in SharedRuntime::fixup_callers_callsite() Reviewed-by: kvn, thartmann, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/10747 From iklam at openjdk.org Sun Oct 23 05:25:22 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 05:25:22 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v12] In-Reply-To: References: Message-ID: > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - added gtest case - removed unnecessary code; simplified start matching; added Symbol::is_star_match - Merge branch 'master' into 8292699-improve-class-printing-in-gdb - @coleenp comments - Allow ClassPrinter to also print to streams other than tty - added functions for printing InstanceKlass* and Method* pointers directly - some code clean-up; added help message - Use proper locking - Do not use class external name in printing - Added detailed printing of invokehandle - ... and 5 more: https://git.openjdk.org/jdk/compare/20736487...c029465b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9957/files - new: https://git.openjdk.org/jdk/pull/9957/files/e4afb9fa..c029465b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=10-11 Stats: 182323 lines in 3671 files changed: 97021 ins; 63297 del; 22005 mod Patch: https://git.openjdk.org/jdk/pull/9957.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9957/head:pull/9957 PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 05:33:15 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 05:33:15 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v11] In-Reply-To: <4pVu3JYIKBDS9jMG47NPjqNmDCv_ZWX6h7CsAALE8So=.d64963a1-f126-4d9a-bdec-52f6b9ebfc17@github.com> References: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> <4pVu3JYIKBDS9jMG47NPjqNmDCv_ZWX6h7CsAALE8So=.d64963a1-f126-4d9a-bdec-52f6b9ebfc17@github.com> Message-ID: <2eVkXntYdW-ZE-7dIFrg13zgEeUCSNvs9k_q1LeJXaA=.4349bafb-f1c1-4c80-9191-37a0dba2810f@github.com> On Fri, 21 Oct 2022 19:01:30 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @coleenp comments > > src/hotspot/share/interpreter/bytecodeTracer.cpp line 171: > >> 169: void BytecodeTracer::trace_interpreter(const methodHandle& method, address bcp, uintptr_t tos, uintptr_t tos2, outputStream* st) { >> 170: if (TraceBytecodes && BytecodeCounter::counter_value() >= TraceBytecodesAt) { >> 171: ttyLocker ttyl; // 5065316: keep the following output coherent > > We should file a a RFE to make this a leaf mutex rather than ttyLocker (which does get broken to allow safepoints!) with no_safepoint_check assuming that the printing doesn't safepoint. Good to remove the no-longer accurate comment. The comment also says: // Using the ttyLocker prevents the system from coming to // a safepoint within this code, which is sensitive to Method* // movement. So are we trying to "prevent Method movement"? Does this even make sense after we switched to Permgen? > src/hotspot/share/utilities/debug.cpp line 664: > >> 662: } >> 663: >> 664: extern "C" JNIEXPORT void printclass(intptr_t k, int flags) { > > It's weird that the function to print the method is findmethod, and the one to print the class is printclass. I see that findmethod is consistent with the 'find' functions above. Maybe this should be findclass() too. I had two groups of printing functions with the following intention: - findclass/findmethod will search for class/methods by their names. Call these when you don't have an InstanceKlass* or Method* - printclass/printmethod should be used when you already have an InstanceKlass* or Method* However, I realized that the second group of methods should really be something like InstanceKlass::print_xxx() and Method::print_xxx(), so you can do something like this in gdb call ik->print_all_methods_with_bytecodes(); I have removed the second group of functions from this PR (commit [6675901](https://github.com/openjdk/jdk/pull/9957/commits/6675901125018853bad316a62702a8f7c62f331b)). I may add them in a follow-up PR if necessary. ------------- PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 06:00:59 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 06:00:59 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v13] In-Reply-To: References: Message-ID: <9VL-tg7BOrgrRXy_8pnhttyKgoQOjvCHvAewWXj_eLE=.c4236b50-159a-4059-b905-79691d5559cf@github.com> > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: removed the use of "continue" keyword; avoid printing class name when none of its methods match ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9957/files - new: https://git.openjdk.org/jdk/pull/9957/files/c029465b..51558336 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=11-12 Stats: 34 lines in 1 file changed: 15 ins; 5 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/9957.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9957/head:pull/9957 PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 06:10:00 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 06:10:00 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v14] In-Reply-To: References: Message-ID: > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed bug where findclass("*", 0x1) does not print classes with no methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9957/files - new: https://git.openjdk.org/jdk/pull/9957/files/51558336..0e55ee8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=12-13 Stats: 13 lines in 1 file changed: 8 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/9957.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9957/head:pull/9957 PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 06:10:00 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 06:10:00 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v11] In-Reply-To: <3pf4ZhDp-3EVyZ2U28L4m8HitjnKsLb2Q6RJGw58mQk=.b2ee8da0-37bb-4839-aa82-c89ea342506a@github.com> References: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> <3pf4ZhDp-3EVyZ2U28L4m8HitjnKsLb2Q6RJGw58mQk=.b2ee8da0-37bb-4839-aa82-c89ea342506a@github.com> Message-ID: On Fri, 21 Oct 2022 18:36:05 GMT, Matias Saavedra Silva wrote: > Overall, this looks good to me! I just have two comments: > > 1. findmethod2() could use a better name, maybe something along the lines of find_method_and_signature()? Not sure if these method calls need to be in camel case or snake case as well. > 2. I noticed the use of continue statements in classFilePrinter: that code could probably be refactored into something that avoids the use of continue I merged findmethod() and findmethod2() into a single function, so you can optionally specify the signature (which is not very common) // call findmethod("*ang/Object*", "wait", 0xff) -> detailed disasm of all "wait" methods in j.l.Object // call findmethod("*ang/Object*", "wait:(*J*)V", 0x1) -> list all "wait" methods in j.l.Object that have a long parameter The functions in debug.cpp are supposed to have short names so they can be easily typed in the debugger. E.g., we have `pp`, `pns` and `pns2`. I refactored the code to avoid using the "continue" keyword. I think it's easier to read now. Thanks for the suggestion! ------------- PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 06:16:57 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 06:16:57 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v15] In-Reply-To: References: Message-ID: > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: added missing ResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9957/files - new: https://git.openjdk.org/jdk/pull/9957/files/0e55ee8a..a4011ba0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=13-14 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9957.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9957/head:pull/9957 PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 19:43:51 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 19:43:51 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v16] In-Reply-To: References: Message-ID: > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: One more ResourceMark fix; fixed gtest case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9957/files - new: https://git.openjdk.org/jdk/pull/9957/files/a4011ba0..92adeaa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=14-15 Stats: 7 lines in 2 files changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9957.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9957/head:pull/9957 PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Sun Oct 23 23:27:20 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 23:27:20 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v4] In-Reply-To: References: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> Message-ID: On Thu, 20 Oct 2022 00:16:33 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - @coleenp comments: changed to AllStatic >> - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime >> - fixed product build >> - @coleenp comments >> - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime >> - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time > > Maybe the third read is the charm but this makes sense to me, and looks good. Thanks to @coleenp and @calvinccheung for the review. Passed tiers1-2. ------------- PR: https://git.openjdk.org/jdk/pull/10330 From iklam at openjdk.org Sun Oct 23 23:27:20 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 23:27:20 GMT Subject: RFR: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time [v4] In-Reply-To: <97_BDeGvocUwYzhMf7-p7FDhONymbbuDpyKp_1Km0dU=.64f85c5a-2945-4899-b476-c52866f12b98@github.com> References: <9tmGKUHiwVATyrTUwn8Po2tl4Ba5NqmwARD_Vh4bbro=.1215e5cb-7c33-4aa1-9091-215175508c02@github.com> <97_BDeGvocUwYzhMf7-p7FDhONymbbuDpyKp_1Km0dU=.64f85c5a-2945-4899-b476-c52866f12b98@github.com> Message-ID: On Thu, 20 Oct 2022 22:49:52 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - @coleenp comments: changed to AllStatic >> - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime >> - fixed product build >> - @coleenp comments >> - Merge branch 'master' into 8293979-resolve-class-references-at-dumptime >> - 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time > > src/hotspot/share/oops/constantPool.cpp line 376: > >> 374: set_resolved_references(OopHandle()); >> 375: >> 376: bool archived = false; > > I think this declaration could be moved to line 392 since it is only used in that case. Fixed ------------- PR: https://git.openjdk.org/jdk/pull/10330 From iklam at openjdk.org Sun Oct 23 23:29:07 2022 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 23 Oct 2022 23:29:07 GMT Subject: Integrated: 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 04:33:55 GMT, Ioi Lam wrote: > Some `JVM_CONSTANT_Class` entries are guaranteed to resolve to the same value at both CDS dump time and run time: > > - Classes that are resolved during `vmClasses::resolve_all()`. These classes cannot be replaced by JVMTI agents at run time. > - Supertypes -- at run time, a class `C` can be loaded from the CDS archive only if all of `C`'s super types are also loaded from the CDS archive. Therefore, we know that a `JVM_CONSTANT_Class` reference to a supertype of `C` must resolved to the same value at both CDS dump time and run time. > > By doing the resolution at dump time, we can speed up run time start-up by a little bit. > > The `ClassPrelinker` class added by this PR will also be used in future REFs for pre-resolving other constant pool entries. The ultimate goal is to resolve `invokedynamic` and `invokehandle` so we can significantly improve the start-up time of features such as Lambda expressions and String concatenation. See [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) This pull request has now been integrated. Changeset: aad81f2e Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/aad81f2eba5a77a028a58a767fd4afc11b4dd528 Stats: 410 lines in 8 files changed: 357 ins; 39 del; 14 mod 8293979: Resolve JVM_CONSTANT_Class references at CDS dump time Reviewed-by: coleenp, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/10330 From eliu at openjdk.org Mon Oct 24 03:10:52 2022 From: eliu at openjdk.org (Eric Liu) Date: Mon, 24 Oct 2022 03:10:52 GMT Subject: RFR: 8293488: Add EOR3 backend rule for aarch64 SHA3 extension [v3] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 14:27:34 GMT, Bhavana Kilambi wrote: >> Arm ISA v8.2A and v9.0A include SHA3 feature extensions and one of those SHA3 instructions - "eor3" performs an exclusive OR of three vectors. This is helpful in applications that have multiple, consecutive "eor" operations which can be reduced by clubbing them into fewer operations using the "eor3" instruction. For example - >> >> eor a, a, b >> eor a, a, c >> >> can be optimized to single instruction - `eor3 a, b, c` >> >> This patch adds backend rules for Neon and SVE2 "eor3" instructions and a micro benchmark to assess the performance gains with this patch. Following are the results of the included micro benchmark on a 128-bit aarch64 machine that supports Neon, SVE2 and SHA3 features - >> >> >> Benchmark gain >> TestEor3.test1Int 10.87% >> TestEor3.test1Long 8.84% >> TestEor3.test2Int 21.68% >> TestEor3.test2Long 21.04% >> >> >> The numbers shown are performance gains with using Neon eor3 instruction over the master branch that uses multiple "eor" instructions instead. Similar gains can be observed with the SVE2 "eor3" version as well since the "eor3" instruction is unpredicated and the machine under test uses a maximum vector width of 128 bits which makes the SVE2 code generation very similar to the one with Neon. > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Changed the modifier order preference in JTREG test LGTM. ------------- Marked as reviewed by eliu (Committer). PR: https://git.openjdk.org/jdk/pull/10407 From rehn at openjdk.org Mon Oct 24 06:57:45 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Oct 2022 06:57:45 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v4] In-Reply-To: References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Fri, 21 Oct 2022 09:06:22 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. >> >> Usecase is in parallelizing the merging of large remsets for G1. >> >> Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). >> >> Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. >> This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). >> >> This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. >> >> Testing: tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > make claim InternalTableClaimer method Hey! Have you looked in: concurrentHashTableTasks.inline.hpp It contains BucketsOperation which is a base to do segmented work (very similar to BucketsClaimer). The segmented work may be done with multiple threads. ------------- PR: https://git.openjdk.org/jdk/pull/10759 From iwalulya at openjdk.org Mon Oct 24 07:10:33 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Oct 2022 07:10:33 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v4] In-Reply-To: References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Mon, 24 Oct 2022 06:55:26 GMT, Robbin Ehn wrote: > Hey! > > Have you looked in: concurrentHashTableTasks.inline.hpp It contains BucketsOperation which is a base to do segmented work (very similar to BucketsClaimer). The segmented work may be done with multiple threads. I missed this, thanks for pointing it out. We should be able to extend the same for the scan operation. ------------- PR: https://git.openjdk.org/jdk/pull/10759 From fyang at openjdk.org Mon Oct 24 07:46:42 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Oct 2022 07:46:42 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Fri, 21 Oct 2022 15:32:02 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: > > - Disable block zeroing in case CacheLineSize isn't the default value > - Disable UseZicboz if CacheLineSize is set by user Still several more comments. What kind of tests has been carried out for this change? src/hotspot/cpu/riscv/globals_riscv.hpp line 90: > 88: "Minimum size in bytes when block zeroing will be used") \ > 89: range(1, max_jint) \ > 90: product(intx, CacheLineSize, DEFAULT_CACHE_LINE_SIZE, \ Given that 64 bytes is effectively an industry standard, I don't think we need to add "CacheLineSize" as an option at this stage. It will be safer if everything here only depend on "UseZic64b" option which only implies cache block size of 64 bytes. This would also simplify the changes in file: src/hotspot/cpu/riscv/vm_version_riscv.cpp. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3985: > 3983: sd(zr, Address(ptr, j*8)); > 3984: } > 3985: addi(ptr, ptr, i*8); Missing space before and after '*' operator here and other places. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 671: > 669: // x29 < MacroAssembler::zero_words_block_size. > 670: > 671: address generate_zero_blocks() { Could you please correct the comment at line #669 please? It looks to me that x29 is bigger than MacroAssembler::zero_words_block_size. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 696: > 694: __ mv(tmp1, MacroAssembler::zero_words_block_size); > 695: __ bind(loop); > 696: __ blt(cnt, tmp1, done); Can we avoid adding this one-extra "blt" instruction into the loop here? ------------- Changes requested by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10718 From rkennke at openjdk.org Mon Oct 24 08:03:13 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 24 Oct 2022 08:03:13 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v7] In-Reply-To: References: Message-ID: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) > - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Merge remote-tracking branch 'upstream/master' into fast-locking - More RISC-V fixes - Merge remote-tracking branch 'origin/fast-locking' into fast-locking - RISC-V port - Revert "Re-use r0 in call to unlock_object()" This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. - Merge remote-tracking branch 'origin/fast-locking' into fast-locking - Fix number of rt args to complete_monitor_locking_C, remove some comments - Re-use r0 in call to unlock_object() - Merge tag 'jdk-20+17' into fast-locking Added tag jdk-20+17 for changeset 79ccc791 - Fix OSR packing in AArch64, part 2 - ... and 25 more: https://git.openjdk.org/jdk/compare/65c84e0c...a67eb95e ------------- Changes: https://git.openjdk.org/jdk/pull/10590/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=06 Stats: 4031 lines in 137 files changed: 731 ins; 2703 del; 597 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From aboldtch at openjdk.org Mon Oct 24 08:13:54 2022 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 24 Oct 2022 08:13:54 GMT Subject: RFR: 8295808: GrowableArray should support capacity management In-Reply-To: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: On Sat, 22 Oct 2022 01:38:44 GMT, Kim Barrett wrote: > Please review this change to GrowableArray to support capacity management. > Two functions are added to GrowableArray, reserve and shrink_to_fit. Also > renamed the max_length function to capacity. > > Used these new functions in StringDedupTable. > > Testing: mach5 tier1-3 LGTM. Any thoughts of moving `expand_to` and `shrink_to_fit` to a common function, given that they share a lot of logic. Something like a general resize that * Allocates if `new_capacity != 0` * Copy constructs new elements from `0` to `min(old_capacity, new_capacity)` * Default constructs from `min(old_capacity, new_capacity)` to `new_capacity` * Destroy old elements from `0` to `old_capacity` ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/10827 From luhenry at openjdk.org Mon Oct 24 08:23:44 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 08:23:44 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Mon, 24 Oct 2022 07:39:24 GMT, Fei Yang wrote: >> Ludovic Henry has updated the pull request incrementally with two additional commits since the last revision: >> >> - Disable block zeroing in case CacheLineSize isn't the default value >> - Disable UseZicboz if CacheLineSize is set by user > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 671: > >> 669: // x29 < MacroAssembler::zero_words_block_size. >> 670: >> 671: address generate_zero_blocks() { > > Could you please correct the comment at line #669 please? It looks to me that x29 is bigger than MacroAssembler::zero_words_block_size. With the loop "Clear the remaining blocks", `x29` will be smaller than `MacroAssembler::zero_words_block_size`. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 696: > >> 694: __ mv(tmp1, MacroAssembler::zero_words_block_size); >> 695: __ bind(loop); >> 696: __ blt(cnt, tmp1, done); > > Can we avoid adding this one-extra "blt" instruction into the loop here? It replaces the `bgez` from line 691. It's also required since there is no guarantee that there will be at least `MacroAssembler::zero_words_block_size` left. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 08:27:57 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 08:27:57 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> On Mon, 24 Oct 2022 07:43:17 GMT, Fei Yang wrote: > Still several more comments. What kind of tests has been carried out for this change? I've run `hotspot:tier1` tests on a QEMU with support for `Zicbo{z,p,m}` > src/hotspot/cpu/riscv/globals_riscv.hpp line 90: > >> 88: "Minimum size in bytes when block zeroing will be used") \ >> 89: range(1, max_jint) \ >> 90: product(intx, CacheLineSize, DEFAULT_CACHE_LINE_SIZE, \ > > Given that 64 bytes is effectively an industry standard, I don't think we need to add "CacheLineSize" as an option at this stage. It will be safer if everything here only depend on "UseZic64b" option which only implies cache block size of 64 bytes. This would also simplify the changes in file: src/hotspot/cpu/riscv/vm_version_riscv.cpp. Is there is no guarantee that a platform will provide both `Zicboz` _and_ `Zic64b` even if the cache lines are 64 bytes? As in, the cache lines are 64 bytes, but the CPU doesn't notify the kernel that it supports `Zic64b`? In that case, we won't use `Zicboz` which can be significantly faster than other approaches to zero out memory (the CPU is free not to fetch a cache line before storing it, since the whole cache line is zeroed out). Given the overall cost of zeroing out memory in Java, it's necessary to ensure it's as fast as possible. I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 08:32:53 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 08:32:53 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 08:24:46 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/globals_riscv.hpp line 90: >> >>> 88: "Minimum size in bytes when block zeroing will be used") \ >>> 89: range(1, max_jint) \ >>> 90: product(intx, CacheLineSize, DEFAULT_CACHE_LINE_SIZE, \ >> >> Given that 64 bytes is effectively an industry standard, I don't think we need to add "CacheLineSize" as an option at this stage. It will be safer if everything here only depend on "UseZic64b" option which only implies cache block size of 64 bytes. This would also simplify the changes in file: src/hotspot/cpu/riscv/vm_version_riscv.cpp. > > Is there is no guarantee that a platform will provide both `Zicboz` _and_ `Zic64b` even if the cache lines are 64 bytes? As in, the cache lines are 64 bytes, but the CPU doesn't notify the kernel that it supports `Zic64b`? > > In that case, we won't use `Zicboz` which can be significantly faster than other approaches to zero out memory (the CPU is free not to fetch a cache line before storing it, since the whole cache line is zeroed out). Given the overall cost of zeroing out memory in Java, it's necessary to ensure it's as fast as possible. > > I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. Another solution is to do something similar to AArch64 and have vendor-specific setup in `src/hotspot/cpu/riscv/vm_version_riscv.cpp`. See https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L132-L222 for example. `CacheLineSize` can then be then be a global variable defined in `src/hotspot/cpu/riscv/globalDefinitions_riscv.hpp` which is set based on the vendor and a default value. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From fyang at openjdk.org Mon Oct 24 08:40:52 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Oct 2022 08:40:52 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Mon, 24 Oct 2022 08:20:13 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 671: >> >>> 669: // x29 < MacroAssembler::zero_words_block_size. >>> 670: >>> 671: address generate_zero_blocks() { >> >> Could you please correct the comment at line #669 please? It looks to me that x29 is bigger than MacroAssembler::zero_words_block_size. > > With the loop "Clear the remaining blocks", `x29` will be smaller than `MacroAssembler::zero_words_block_size`. I see. It is describing the output instead of input here. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From fyang at openjdk.org Mon Oct 24 08:43:57 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Oct 2022 08:43:57 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Mon, 24 Oct 2022 08:19:11 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 696: >> >>> 694: __ mv(tmp1, MacroAssembler::zero_words_block_size); >>> 695: __ bind(loop); >>> 696: __ blt(cnt, tmp1, done); >> >> Can we avoid adding this one-extra "blt" instruction into the loop here? > > It replaces the `bgez` from line 691. It's also required since there is no guarantee that there will be at least `MacroAssembler::zero_words_block_size` left. How about keeping the original 'bltz' check before the loop and 'bgez' check in the loop? ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 08:48:26 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 08:48:26 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v12] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Fix spacing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/e232c1b7..4cbaaa49 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=10-11 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 08:58:32 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 08:58:32 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v13] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/4cbaaa49..d01f4b1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=11-12 Stats: 7 lines in 2 files changed: 1 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From fyang at openjdk.org Mon Oct 24 09:04:22 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Oct 2022 09:04:22 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 08:30:31 GMT, Ludovic Henry wrote: >> Is there is no guarantee that a platform will provide both `Zicboz` _and_ `Zic64b` even if the cache lines are 64 bytes? As in, the cache lines are 64 bytes, but the CPU doesn't notify the kernel that it supports `Zic64b`? >> >> In that case, we won't use `Zicboz` which can be significantly faster than other approaches to zero out memory (the CPU is free not to fetch a cache line before storing it, since the whole cache line is zeroed out). Given the overall cost of zeroing out memory in Java, it's necessary to ensure it's as fast as possible. >> >> I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. > > Another solution is to do something similar to AArch64 and have vendor-specific setup in `src/hotspot/cpu/riscv/vm_version_riscv.cpp`. See https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L132-L222 for example. `CacheLineSize` can then be then be a global variable defined in `src/hotspot/cpu/riscv/globalDefinitions_riscv.hpp` which is set based on the vendor and a default value. > Is there is no guarantee that a platform will provide both `Zicboz` _and_ `Zic64b` even if the cache lines are 64 bytes? As in, the cache lines are 64 bytes, but the CPU doesn't notify the kernel that it supports `Zic64b`? > > In that case, we won't use `Zicboz` which can be significantly faster than other approaches to zero out memory (the CPU is free not to fetch a cache line before storing it, since the whole cache line is zeroed out). Given the overall cost of zeroing out memory in Java, it's necessary to ensure it's as fast as possible. > > I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. AFAIK, it's still not clear how those extensions like Zva, Zvb, Zicboz, etc, should be auto-detected at runtime for now. I see this is still under discussion on LKML not long ago. So for now we depend on users to specify corresponding JVM options explicitly on the command line when the extensions are available. So for your concern here, I would expect the user to specify both Zicboz and Zic64b explictly for performance. We will revisit the auto-detection part once the correct way is settled down. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 09:04:24 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 09:04:24 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 08:58:01 GMT, Fei Yang wrote: >> Another solution is to do something similar to AArch64 and have vendor-specific setup in `src/hotspot/cpu/riscv/vm_version_riscv.cpp`. See https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L132-L222 for example. `CacheLineSize` can then be then be a global variable defined in `src/hotspot/cpu/riscv/globalDefinitions_riscv.hpp` which is set based on the vendor and a default value. > >> Is there is no guarantee that a platform will provide both `Zicboz` _and_ `Zic64b` even if the cache lines are 64 bytes? As in, the cache lines are 64 bytes, but the CPU doesn't notify the kernel that it supports `Zic64b`? >> >> In that case, we won't use `Zicboz` which can be significantly faster than other approaches to zero out memory (the CPU is free not to fetch a cache line before storing it, since the whole cache line is zeroed out). Given the overall cost of zeroing out memory in Java, it's necessary to ensure it's as fast as possible. >> >> I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. > > AFAIK, it's still not clear how those extensions like Zva, Zvb, Zicboz, etc, should be auto-detected at runtime for now. > I see this is still under discussion on LKML not long ago. So for now we depend on users to specify corresponding JVM options explicitly on the command line when the extensions are available. So for your concern here, I would expect the user to specify both Zicboz and Zic64b explictly for performance. We will revisit the auto-detection part once the correct way is settled down. We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. I'll remove this `CacheLineSize` flag for now and we'll revisit later. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From thartmann at openjdk.org Mon Oct 24 09:06:55 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Oct 2022 09:06:55 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3] In-Reply-To: References: Message-ID: <9XWZNcNcmELCLXDwpuNgpztPrw8xXajJQcj_daf4jhU=.4af44336-021f-4688-9a56-6a90c8e12f53@github.com> On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > further restrict UsePolyIntrinsics with supports_avx512vlbw Thanks, I'll re-run testing. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From fyang at openjdk.org Mon Oct 24 09:16:55 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Oct 2022 09:16:55 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 09:01:39 GMT, Ludovic Henry wrote: > We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. I think I mean extensions like 'Zba', 'Zbb', etc here. Are you sure those could be detected from `/proc/cpuinfo`? I am assuming that only works for extensions contained in RV64GCV (IMAFDCV). ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 09:26:47 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 09:26:47 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 09:13:09 GMT, Fei Yang wrote: >> We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. >> >> I'll remove this `CacheLineSize` flag for now and we'll revisit later. > >> We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. > > I think I mean extensions like 'Zba', 'Zbb', etc here. > Are you sure those could be detected from `/proc/cpuinfo`? > I am assuming that only works for extensions contained in RV64GCV (IMAFDCV). This is a sample of `/proc/cpuinfo` on a QEMU: processor : 0 hart : 30 isa : rv64imafdcvh_zicsr_zifencei_zihintpause_zba_zbb_zbc_zbs_sstc mmu : sv48 You can `strstr` (or some alternative of it) on the `isa` string for the extension you want. That `isa` string comes from the kernel at https://github.com/torvalds/linux/blob/master/arch/riscv/kernel/cpu.c#L141-L146. The obvious problem at the moment is that not all extensions have been added here which is why we will want a better detection mechanism. However, on custom kernels, you can have most of the extensions you want. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 09:38:11 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 09:38:11 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v14] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <_wXNM8cc7Tw_2wMdwTFy7_vSgA6g9d-KZ3pBiEynuPU=.5f5595b2-3778-49dd-948d-e4bb1d066e06@github.com> > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Enable Block Zeroing when Usez64b is enabled only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/d01f4b1c..021f53d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=12-13 Stats: 27 lines in 5 files changed: 0 ins; 18 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 09:38:11 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 09:38:11 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 09:13:09 GMT, Fei Yang wrote: >> We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. >> >> I'll remove this `CacheLineSize` flag for now and we'll revisit later. > >> We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. > > I think I mean extensions like 'Zba', 'Zbb', etc here. > Are you sure those could be detected from `/proc/cpuinfo`? > I am assuming that only works for extensions contained in RV64GCV (IMAFDCV). @RealFYang I've done the change as you requested. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From rehn at openjdk.org Mon Oct 24 11:04:16 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 24 Oct 2022 11:04:16 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v7] In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 08:03:13 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) >> - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: > > - Merge remote-tracking branch 'upstream/master' into fast-locking > - More RISC-V fixes > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - RISC-V port > - Revert "Re-use r0 in call to unlock_object()" > > This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Fix number of rt args to complete_monitor_locking_C, remove some comments > - Re-use r0 in call to unlock_object() > - Merge tag 'jdk-20+17' into fast-locking > > Added tag jdk-20+17 for changeset 79ccc791 > - Fix OSR packing in AArch64, part 2 > - ... and 25 more: https://git.openjdk.org/jdk/compare/65c84e0c...a67eb95e First the "SharedRuntime::complete_monitor_locking_C" crash do not reproduce. Secondly, a question/suggestion: Many recursive cases do not interleave locks, meaning the recursive enter will happen with the lock/oop top of lock stack already. Why not peak at top lock/oop in lock-stack if the is current just push it again and the locking is done? (instead of inflating) (exit would need to check if this is the last one and then proper exit) Worried about the size of the lock-stack? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From tschatzl at openjdk.org Mon Oct 24 11:45:51 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Oct 2022 11:45:51 GMT Subject: RFR: 8295808: GrowableArray should support capacity management In-Reply-To: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: <2DpBVXzJxwTGiorw3S08xXoPgqiAXR0-bydk93OzapE=.51a049f7-13a7-4ffc-9aa1-9eb2033872fa@github.com> On Sat, 22 Oct 2022 01:38:44 GMT, Kim Barrett wrote: > Please review this change to GrowableArray to support capacity management. > Two functions are added to GrowableArray, reserve and shrink_to_fit. Also > renamed the max_length function to capacity. > > Used these new functions in StringDedupTable. > > Testing: mach5 tier1-3 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10827 From mneugschwand at openjdk.org Mon Oct 24 11:54:55 2022 From: mneugschwand at openjdk.org (Matthias Neugschwandtner) Date: Mon, 24 Oct 2022 11:54:55 GMT Subject: RFR: 8295776: [JVMCI] Add x86 CPU flags for MPK and CET In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 16:18:17 GMT, Vladimir Kozlov wrote: > Changes look fine. I approve them. > > The only confusion I have is platform name in title (yes, I know it is because Graal still uses AMD64 as platform name). May be better use x86 in title (I assume it is not 64-bit specific feature). I see these features are implemented in Intel's CPUs. Do AMD also have them? Hi @vnkozlov , thank you for the review! I adjusted the PR title to use x86 instead of AMD64. As for the features themselves, yes, AMD CPUs have them as well. ------------- PR: https://git.openjdk.org/jdk/pull/10810 From dnsimon at openjdk.org Mon Oct 24 11:59:01 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 24 Oct 2022 11:59:01 GMT Subject: RFR: 8295776: [JVMCI] Add x86 CPU flags for MPK and CET In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 09:48:23 GMT, Matthias Neugschwandtner wrote: > Add the CPU flags for memory protection keys (MPK) and control flow enforcement technology (CET) to JVMCI such that Graal can make use of these technologies. Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10810 From fyang at openjdk.org Mon Oct 24 12:10:53 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 24 Oct 2022 12:10:53 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 09:33:07 GMT, Ludovic Henry wrote: >>> We can already do auto-detection via `/proc/cpuinfo`. I agree it's not the best way nor the most stable way, and I can only assume there will be a better way, but there is already a way. >> >> I think I mean extensions like 'Zba', 'Zbb', etc here. >> Are you sure those could be detected from `/proc/cpuinfo`? >> I am assuming that only works for extensions contained in RV64GCV (IMAFDCV). > > @RealFYang I've done the change as you requested. Thanks for the update. Looks like DEFAULT_CACHE_LINE_SIZE is not defined anywhere? Looks good, otherwise. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From mneugschwand at openjdk.org Mon Oct 24 12:13:09 2022 From: mneugschwand at openjdk.org (Matthias Neugschwandtner) Date: Mon, 24 Oct 2022 12:13:09 GMT Subject: Integrated: 8295776: [JVMCI] Add x86 CPU flags for MPK and CET In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 09:48:23 GMT, Matthias Neugschwandtner wrote: > Add the CPU flags for memory protection keys (MPK) and control flow enforcement technology (CET) to JVMCI such that Graal can make use of these technologies. This pull request has now been integrated. Changeset: d50b6eb3 Author: Matthias Neugschwandtner Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0 Stats: 37 lines in 4 files changed: 34 ins; 0 del; 3 mod 8295776: [JVMCI] Add x86 CPU flags for MPK and CET Reviewed-by: kvn, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/10810 From luhenry at openjdk.org Mon Oct 24 12:27:34 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 12:27:34 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 12:08:53 GMT, Fei Yang wrote: >> @RealFYang I've done the change as you requested. > > Thanks for the update. Looks like DEFAULT_CACHE_LINE_SIZE is not defined anywhere? Looks good, otherwise. It's defined in `src/hotspot/share/utilities/globalDefinitions.hpp` but let me add it in `src/hotspot/cpu/riscv/globalDefinitions_riscv.hpp` just to make sure. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 12:33:04 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 12:33:04 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v15] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <3QnFFt3ImWh7WTMqJXTaYnJ_KDNxfIfOcDfSBGoHmfI=.e4debcdb-0b06-46a3-b02e-877f0b4710c9@github.com> > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Make sure DEFAULT_CACHE_LINE_SIZE is defined for risc-v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/021f53d9..57910650 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=13-14 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From vkempik at openjdk.org Mon Oct 24 12:58:50 2022 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 24 Oct 2022 12:58:50 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 12:23:58 GMT, Ludovic Henry wrote: >> Thanks for the update. Looks like DEFAULT_CACHE_LINE_SIZE is not defined anywhere? Looks good, otherwise. > > It's defined in `src/hotspot/share/utilities/globalDefinitions.hpp` but let me add it in `src/hotspot/cpu/riscv/globalDefinitions_riscv.hpp` just to make sure. > I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. I strongly disagree. Until processor reports Zic64b supported we should assume the cache line size could be anything (2^N). I'm running some tests on risc-v fpga core and it has 16-bytes cache block size. having to patch ( change default cache line size) and rebuild openjdk to just try new Zicbo{z,p,m} opcodes ( once they are implemented) - isn't a nice way ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Mon Oct 24 13:49:01 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 24 Oct 2022 13:49:01 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v16] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: Fix cbo_zero encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/57910650..3156dc9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From iwalulya at openjdk.org Mon Oct 24 15:57:49 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Oct 2022 15:57:49 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 12:49:42 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 Marked as reviewed by iwalulya (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10675 From coleenp at openjdk.org Mon Oct 24 18:12:51 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Oct 2022 18:12:51 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v11] In-Reply-To: <2eVkXntYdW-ZE-7dIFrg13zgEeUCSNvs9k_q1LeJXaA=.4349bafb-f1c1-4c80-9191-37a0dba2810f@github.com> References: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> <4pVu3JYIKBDS9jMG47NPjqNmDCv_ZWX6h7CsAALE8So=.d64963a1-f126-4d9a-bdec-52f6b9ebfc17@github.com> <2eVkXntYdW-ZE-7dIFrg13zgEeUCSNvs9k_q1LeJXaA=.4349bafb-f1c1-4c80-9191-37a0dba2810f@github.com> Message-ID: On Sun, 23 Oct 2022 05:31:12 GMT, Ioi Lam wrote: >> src/hotspot/share/interpreter/bytecodeTracer.cpp line 171: >> >>> 169: void BytecodeTracer::trace_interpreter(const methodHandle& method, address bcp, uintptr_t tos, uintptr_t tos2, outputStream* st) { >>> 170: if (TraceBytecodes && BytecodeCounter::counter_value() >= TraceBytecodesAt) { >>> 171: ttyLocker ttyl; // 5065316: keep the following output coherent >> >> We should file a a RFE to make this a leaf mutex rather than ttyLocker (which does get broken to allow safepoints!) with no_safepoint_check assuming that the printing doesn't safepoint. Good to remove the no-longer accurate comment. > > The comment also says: > > > // Using the ttyLocker prevents the system from coming to > // a safepoint within this code, which is sensitive to Method* > // movement. > > > So are we trying to "prevent Method movement"? Does this even make sense after we switched to Permgen? Nope doesn't make sense. >> src/hotspot/share/utilities/debug.cpp line 664: >> >>> 662: } >>> 663: >>> 664: extern "C" JNIEXPORT void printclass(intptr_t k, int flags) { >> >> It's weird that the function to print the method is findmethod, and the one to print the class is printclass. I see that findmethod is consistent with the 'find' functions above. Maybe this should be findclass() too. > > I had two groups of printing functions with the following intention: > > - findclass/findmethod will search for class/methods by their names. Call these when you don't have an InstanceKlass* or Method* > - printclass/printmethod should be used when you already have an InstanceKlass* or Method* > > However, I realized that the second group of methods should really be something like InstanceKlass::print_xxx() and Method::print_xxx(), so you can do something like this in gdb > > > call ik->print_all_methods_with_bytecodes(); > > > I have removed the second group of functions from this PR (commit [6675901](https://github.com/openjdk/jdk/pull/9957/commits/6675901125018853bad316a62702a8f7c62f331b)). I may add them in a follow-up PR if necessary. Yes, I agree. If you already have the InstanceKlass, it should be a function using that as a 'this' pointer. ------------- PR: https://git.openjdk.org/jdk/pull/9957 From coleenp at openjdk.org Mon Oct 24 18:17:17 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Oct 2022 18:17:17 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v16] In-Reply-To: References: Message-ID: <0WB0qX9AA0ivRmU2aVbeI-sDMfXMLRN09KOvahg2p6E=.3929578f-317d-40e1-9d46-9893a5b83868@github.com> On Sun, 23 Oct 2022 19:43:51 GMT, Ioi Lam wrote: >> Current in gdb, you can print information about a class or method with something like >> >> >> call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) >> call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) >> >> >> However, it's difficult to find a class or method by its name and print out its contents. >> >> This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. >> >> - `findclass()`: class name only >> - `findmethod()`: class name and method name >> - `findmethod2()`: class name and method name/signature >> >> I also cleaned up `BytecodeTracer` to remove unnecessary complexity. >> >> Here are some examples: >> >> >> (gdb) call findclass("java/lang/Object", 0) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> >> (gdb) call findclass("java/lang/Object", 1) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> 0x00007fffb4000658 : ()V >> 0x00007fffb40010f0 finalize : ()V >> 0x00007fffb4000f00 wait0 : (J)V >> 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z >> 0x00007fffb4000aa0 toString : ()Ljava/lang/String; >> 0x00007fffb40007f0 hashCode : ()I >> 0x00007fffb4000720 getClass : ()Ljava/lang/Class; >> 0x00007fffb40009a0 clone : ()Ljava/lang/Object; >> 0x00007fffb4000b50 notify : ()V >> 0x00007fffb4000c20 notifyAll : ()V >> 0x00007fffb4000e50 wait : (J)V >> 0x00007fffb4001028 wait : (JI)V >> 0x00007fffb4000d08 wait : ()V >> >> (gdb) call findclass("*ClassLoader", 0) >> [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' >> [....] >> >> (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) >> [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' >> 0x00007fffb4000d08 wait : ()V >> 0x00007fffb4000ce8 0 fast_aload_0 >> 0x00007fffb4000ce9 1 lconst_0 >> 0x00007fffb4000cea 2 invokevirtual 38 >> 0x00007fffb4000ced 5 return > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > One more ResourceMark fix; fixed gtest case Nice, I like the gtest. src/hotspot/share/oops/symbol.cpp line 147: > 145: } > 146: > 147: bool Symbol::is_star_match(const char* star_pattern) const { So this variable name should be "pattern" not "star_pattern" since it may not have a star. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/9957 From coleenp at openjdk.org Mon Oct 24 18:17:17 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 24 Oct 2022 18:17:17 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v7] In-Reply-To: References: Message-ID: On Thu, 22 Sep 2022 18:13:39 GMT, Ioi Lam wrote: >> src/hotspot/share/interpreter/bytecodeTracer.cpp line 434: >> >>> 432: // TODO: print info for tag.is_dynamic_constant() >>> 433: } >>> 434: } >> >> This should be a function in ConstantPool::print() > > This block of code works on an `indy_index`, which is stored only inside the bytecode stream (it's the rewritten index of the [invokedynamic bytecode](https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-6.html#jvms-6.5.invokedynamic). `indy_index` is not store anywhere inside the `ConstantPool`, so I can't find a good place to print this info in the printing functions of `ConstantPool`. > > Note that you can have multiple `invokedynamic` bytecodes that use the same `JVM_CONSTANT_InvokeDynamic` entry in a `ConstantPool`, resulting in a different `indy_index` for each call site. Therefore, this per-callsite information cannot be printed as part of the `JVM_CONSTANT_InvokeDynamic` entry ok ------------- PR: https://git.openjdk.org/jdk/pull/9957 From sviswanathan at openjdk.org Mon Oct 24 18:26:51 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 24 Oct 2022 18:26:51 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3] In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > further restrict UsePolyIntrinsics with supports_avx512vlbw @ascarpino Could you please also take a look at this PR? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From kvn at openjdk.org Mon Oct 24 18:40:36 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Oct 2022 18:40:36 GMT Subject: RFR: 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" Message-ID: [JDK-8295776](https://github.com/openjdk/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0) added new x86 CPU flags checks but did not update CPUInfoTest.java. Add missing x86 CPU's flags to CPUInfoTest.java. Tested with tier1 when CPUInfoTest.java is ran. ------------- Commit messages: - Update comment - Added comment and aligned flags - 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" Changes: https://git.openjdk.org/jdk/pull/10837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10837&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295844 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10837.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10837/head:pull/10837 PR: https://git.openjdk.org/jdk/pull/10837 From tschatzl at openjdk.org Mon Oct 24 18:40:37 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Oct 2022 18:40:37 GMT Subject: RFR: 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 18:13:04 GMT, Vladimir Kozlov wrote: > [JDK-8295776](https://github.com/openjdk/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0) added new x86 CPU flags checks but did not update CPUInfoTest.java. > Add missing x86 CPU's flags to CPUInfoTest.java. > > Tested with tier1 when CPUInfoTest.java is ran. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/10837 From dnsimon at openjdk.org Mon Oct 24 18:40:38 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 24 Oct 2022 18:40:38 GMT Subject: RFR: 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 18:13:04 GMT, Vladimir Kozlov wrote: > [JDK-8295776](https://github.com/openjdk/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0) added new x86 CPU flags checks but did not update CPUInfoTest.java. > Add missing x86 CPU's flags to CPUInfoTest.java. > > Tested with tier1 when CPUInfoTest.java is ran. Marked as reviewed by dnsimon (Committer). src/hotspot/cpu/x86/vm_version_x86.hpp line 315: > 313: > 314: /* > 315: * Update next files when declaring new flags: Update *following* files... ------------- PR: https://git.openjdk.org/jdk/pull/10837 From kvn at openjdk.org Mon Oct 24 18:40:39 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Oct 2022 18:40:39 GMT Subject: RFR: 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 18:13:04 GMT, Vladimir Kozlov wrote: > [JDK-8295776](https://github.com/openjdk/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0) added new x86 CPU flags checks but did not update CPUInfoTest.java. > Add missing x86 CPU's flags to CPUInfoTest.java. > > Tested with tier1 when CPUInfoTest.java is ran. There were 2 cases when we forgot update CPUInfoTest.java and JVMCI's AMD64.java. I added comment to vm_version_x86.hpp to remind about that. I also aligned flags in CPUInfoTest.java to match existing padding. ------------- PR: https://git.openjdk.org/jdk/pull/10837 From kvn at openjdk.org Mon Oct 24 18:40:41 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Oct 2022 18:40:41 GMT Subject: RFR: 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 18:30:28 GMT, Doug Simon wrote: >> [JDK-8295776](https://github.com/openjdk/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0) added new x86 CPU flags checks but did not update CPUInfoTest.java. >> Add missing x86 CPU's flags to CPUInfoTest.java. >> >> Tested with tier1 when CPUInfoTest.java is ran. > > src/hotspot/cpu/x86/vm_version_x86.hpp line 315: > >> 313: >> 314: /* >> 315: * Update next files when declaring new flags: > > Update *following* files... done ------------- PR: https://git.openjdk.org/jdk/pull/10837 From iklam at openjdk.org Mon Oct 24 18:56:50 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Oct 2022 18:56:50 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v11] In-Reply-To: References: <867TdKmJce7HpB-gIDtPbGAnZlEloOjgLjRfFIwGZgk=.2d7b3a8d-a54a-4363-81ef-a175a34aeebe@github.com> <4pVu3JYIKBDS9jMG47NPjqNmDCv_ZWX6h7CsAALE8So=.d64963a1-f126-4d9a-bdec-52f6b9ebfc17@github.com> <2eVkXntYdW-ZE-7dIFrg13zgEeUCSNvs9k_q1LeJXaA=.4349bafb-f1c1-4c80-9191-37a0dba2810f@github.com> Message-ID: On Mon, 24 Oct 2022 18:06:38 GMT, Coleen Phillimore wrote: >> The comment also says: >> >> >> // Using the ttyLocker prevents the system from coming to >> // a safepoint within this code, which is sensitive to Method* >> // movement. >> >> >> So are we trying to "prevent Method movement"? Does this even make sense after we switched to Permgen? > > Nope doesn't make sense. I filed https://bugs.openjdk.org/browse/JDK-8295851 "Do not use ttyLock in BytecodeTracer::trace" ------------- PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Mon Oct 24 19:01:56 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Oct 2022 19:01:56 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v17] In-Reply-To: References: Message-ID: <2ZqAGDCecjWgLCNC7_ImiMkGN4iJMXk6TFs1HsrDT-A=.b5b32b28-3905-405a-9297-d7b60a44032e@github.com> > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8292699-improve-class-printing-in-gdb - @coleenp comment - rename star_pattern to pattern - One more ResourceMark fix; fixed gtest case - added missing ResourceMark - fixed bug where findclass("*", 0x1) does not print classes with no methods - removed the use of "continue" keyword; avoid printing class name when none of its methods match - added gtest case - removed unnecessary code; simplified start matching; added Symbol::is_star_match - Merge branch 'master' into 8292699-improve-class-printing-in-gdb - @coleenp comments - ... and 11 more: https://git.openjdk.org/jdk/compare/e4afe352...22a7bb2b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9957/files - new: https://git.openjdk.org/jdk/pull/9957/files/92adeaa7..22a7bb2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9957&range=15-16 Stats: 101997 lines in 440 files changed: 69069 ins; 4392 del; 28536 mod Patch: https://git.openjdk.org/jdk/pull/9957.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9957/head:pull/9957 PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Mon Oct 24 19:01:58 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Oct 2022 19:01:58 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v16] In-Reply-To: <0WB0qX9AA0ivRmU2aVbeI-sDMfXMLRN09KOvahg2p6E=.3929578f-317d-40e1-9d46-9893a5b83868@github.com> References: <0WB0qX9AA0ivRmU2aVbeI-sDMfXMLRN09KOvahg2p6E=.3929578f-317d-40e1-9d46-9893a5b83868@github.com> Message-ID: On Mon, 24 Oct 2022 18:13:42 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> One more ResourceMark fix; fixed gtest case > > src/hotspot/share/oops/symbol.cpp line 147: > >> 145: } >> 146: >> 147: bool Symbol::is_star_match(const char* star_pattern) const { > > So this variable name should be "pattern" not "star_pattern" since it may not have a star. Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/9957 From kvn at openjdk.org Mon Oct 24 19:16:54 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Oct 2022 19:16:54 GMT Subject: Integrated: 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 18:13:04 GMT, Vladimir Kozlov wrote: > [JDK-8295776](https://github.com/openjdk/jdk/commit/d50b6eb342e9ec96d1a01dafc317e00725dc84c0) added new x86 CPU flags checks but did not update CPUInfoTest.java. > Add missing x86 CPU's flags to CPUInfoTest.java. > > Tested with tier1 when CPUInfoTest.java is ran. This pull request has now been integrated. Changeset: e122321c Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/e122321cb599d2e0041029b34b306ce88117aef7 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod 8295844: jdk/test/whitebox/CPUInfoTest.java failed with "not all features are known: expected true, was false" Reviewed-by: tschatzl, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/10837 From sviswanathan at openjdk.org Mon Oct 24 20:35:53 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 24 Oct 2022 20:35:53 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3] In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > further restrict UsePolyIntrinsics with supports_avx512vlbw test/micro/org/openjdk/bench/javax/crypto/full/Poly1305DigestBench.java line 37: > 35: import java.security.spec.AlgorithmParameterSpec; > 36: import javax.crypto.spec.SecretKeySpec; > 37: Please add the following: import org.openjdk.jmh.annotations.Fork; @Fork(value = 1, jvmArgsAppend = {"--add-opens", "java.base/com.sun.crypto.provider=A LL-UNNAMED"}) ------------- PR: https://git.openjdk.org/jdk/pull/10582 From sspitsyn at openjdk.org Mon Oct 24 21:31:39 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 24 Oct 2022 21:31:39 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. [v2] In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 21:39:23 GMT, Leonid Mesnik wrote: >> The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fix Looks good. Thanks, Serguei Thank you for the update. One more question. A couple of tests are not listed in the `test/hotspot/jtreg/TEST.quick-groups`: test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetFrameLocation/frameloc001 test/hotspot/jtreg/vmTestbase/nsk/jvmti/SingleStep/singlestep002 Is it because these tests were initially missed to be added into this file? ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10665 From lmesnik at openjdk.org Mon Oct 24 21:44:52 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 24 Oct 2022 21:44:52 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. [v2] In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 21:28:55 GMT, Serguei Spitsyn wrote: > Thank you for the update. One more question. A couple of tests are not listed in the `test/hotspot/jtreg/TEST.quick-groups`: > > ``` > test/hotspot/jtreg/vmTestbase/nsk/jvmti/GetFrameLocation/frameloc001 > test/hotspot/jtreg/vmTestbase/nsk/jvmti/SingleStep/singlestep002 > ``` > > Is it because these tests were initially missed to be added into this file? This file contains only tests which had 'quick' keyword before conversion from tonga. So, they might be just were not supposed to add to quick group. ------------- PR: https://git.openjdk.org/jdk/pull/10665 From sspitsyn at openjdk.org Mon Oct 24 22:04:56 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 24 Oct 2022 22:04:56 GMT Subject: RFR: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. [v2] In-Reply-To: References: Message-ID: <9MlcdLqsWOfFFT6UqXZOYvKYL2WaaGC9oCJq-r93RxY=.0ec49354-805b-4e20-93c0-dccec34af82b@github.com> On Fri, 21 Oct 2022 21:39:23 GMT, Leonid Mesnik wrote: >> The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fix Okay, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10665 From duke at openjdk.org Mon Oct 24 22:06:56 2022 From: duke at openjdk.org (vpaprotsk) Date: Mon, 24 Oct 2022 22:06:56 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v4] In-Reply-To: References: Message-ID: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s vpaprotsk has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - assembler checks and test case fixes - Merge remote-tracking branch 'origin/master' into avx512-poly - Merge remote-tracking branch 'origin' into avx512-poly - further restrict UsePolyIntrinsics with supports_avx512vlbw - missed white-space fix - - Fix whitespace and copyright statements - Add benchmark - Merge remote-tracking branch 'vpaprotsk/master' into avx512-poly - Poly1305 AVX512 intrinsic for x86_64 ------------- Changes: https://git.openjdk.org/jdk/pull/10582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=03 Stats: 1719 lines in 30 files changed: 1685 ins; 3 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Mon Oct 24 22:06:57 2022 From: duke at openjdk.org (vpaprotsk) Date: Mon, 24 Oct 2022 22:06:57 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v4] In-Reply-To: References: Message-ID: On Tue, 18 Oct 2022 06:26:38 GMT, Jatin Bhateja wrote: >> vpaprotsk has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - assembler checks and test case fixes >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Merge remote-tracking branch 'origin' into avx512-poly >> - further restrict UsePolyIntrinsics with supports_avx512vlbw >> - missed white-space fix >> - - Fix whitespace and copyright statements >> - Add benchmark >> - Merge remote-tracking branch 'vpaprotsk/master' into avx512-poly >> - Poly1305 AVX512 intrinsic for x86_64 > > src/hotspot/cpu/x86/assembler_x86.cpp line 5484: > >> 5482: >> 5483: void Assembler::evpunpckhqdq(XMMRegister dst, KRegister mask, XMMRegister src1, XMMRegister src2, bool merge, int vector_len) { >> 5484: assert(UseAVX > 2, "requires AVX512F"); > > Please replace flag with feature EVEX check. done > src/hotspot/cpu/x86/assembler_x86.cpp line 7831: > >> 7829: >> 7830: void Assembler::vpandq(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { >> 7831: assert(VM_Version::supports_evex(), ""); > > Assertion should check existence of AVX512VL for non 512 but vectors. done > src/hotspot/cpu/x86/assembler_x86.cpp line 7958: > >> 7956: >> 7957: void Assembler::vporq(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { >> 7958: assert(VM_Version::supports_evex(), ""); > > Same as above done > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1960: > >> 1958: address StubGenerator::generate_poly1305_masksCP() { >> 1959: StubCodeMark mark(this, "StubRoutines", "generate_poly1305_masksCP"); >> 1960: address start = __ pc(); > > You may use [align64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp#L777) here, like done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Mon Oct 24 22:06:58 2022 From: duke at openjdk.org (vpaprotsk) Date: Mon, 24 Oct 2022 22:06:58 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v4] In-Reply-To: <523ASDMlZe7mAZaBQe3ipxBLaLum7_XZqLLUUgsCJi0=.db28f521-c957-4fb2-8dcc-7c09d46189e3@github.com> References: <523ASDMlZe7mAZaBQe3ipxBLaLum7_XZqLLUUgsCJi0=.db28f521-c957-4fb2-8dcc-7c09d46189e3@github.com> Message-ID: On Tue, 18 Oct 2022 23:03:55 GMT, Sandhya Viswanathan wrote: >> vpaprotsk has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - assembler checks and test case fixes >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Merge remote-tracking branch 'origin' into avx512-poly >> - further restrict UsePolyIntrinsics with supports_avx512vlbw >> - missed white-space fix >> - - Fix whitespace and copyright statements >> - Add benchmark >> - Merge remote-tracking branch 'vpaprotsk/master' into avx512-poly >> - Poly1305 AVX512 intrinsic for x86_64 > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 262: > >> 260: private static void processMultipleBlocks(byte[] input, int offset, int length, byte[] aBytes, byte[] rBytes) { >> 261: MutableIntegerModuloP A = ipl1305.getElement(aBytes).mutable(); >> 262: MutableIntegerModuloP R = ipl1305.getElement(rBytes).mutable(); > > R doesn't need to be mutable. done > test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305IntrinsicFuzzTest.java line 39: > >> 37: public static void main(String[] args) throws Exception { >> 38: //Note: it might be useful to increase this number during development of new Poly1305 intrinsics >> 39: final int repeat = 100; > > Should we increase this repeat count for the c2 compiler to kick in for compiling engineUpdate() and have the call to stub in place from there? did it with `@run main/othervm -Xcomp -XX:-TieredCompilation com.sun.crypto.provider.Cipher.ChaCha20.Poly1305UnitTestDriver` > test/jdk/com/sun/crypto/provider/Cipher/ChaCha20/unittest/java.base/com/sun/crypto/provider/Poly1305KAT.java line 133: > >> 131: System.out.println("*** Test " + ++testNumber + ": " + >> 132: test.testName); >> 133: if (runSingleTest(test)) { > > runSingleTest may need to be called enough number of times for the engineUpdate to be compiled by c2. added a second copy with `@run main/othervm -Xcomp -XX:-TieredCompilation com.sun.crypto.provider.Cipher.ChaCha20.Poly1305UnitTestDriver` ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Mon Oct 24 22:07:00 2022 From: duke at openjdk.org (vpaprotsk) Date: Mon, 24 Oct 2022 22:07:00 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3] In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 20:31:31 GMT, Sandhya Viswanathan wrote: >> vpaprotsk has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> further restrict UsePolyIntrinsics with supports_avx512vlbw > > test/micro/org/openjdk/bench/javax/crypto/full/Poly1305DigestBench.java line 37: > >> 35: import java.security.spec.AlgorithmParameterSpec; >> 36: import javax.crypto.spec.SecretKeySpec; >> 37: > > Please add the following: > import org.openjdk.jmh.annotations.Fork; > @Fork(value = 1, jvmArgsAppend = {"--add-opens", "java.base/com.sun.crypto.provider=A > LL-UNNAMED"}) done. Also added longer warmup ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Mon Oct 24 22:09:29 2022 From: duke at openjdk.org (vpaprotsk) Date: Mon, 24 Oct 2022 22:09:29 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: Message-ID: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: extra whitespace character ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/de7e138b..883be106 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From iklam at openjdk.org Mon Oct 24 22:18:48 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Oct 2022 22:18:48 GMT Subject: RFR: 8292699: Improve printing of classes in native debugger [v16] In-Reply-To: <0WB0qX9AA0ivRmU2aVbeI-sDMfXMLRN09KOvahg2p6E=.3929578f-317d-40e1-9d46-9893a5b83868@github.com> References: <0WB0qX9AA0ivRmU2aVbeI-sDMfXMLRN09KOvahg2p6E=.3929578f-317d-40e1-9d46-9893a5b83868@github.com> Message-ID: On Mon, 24 Oct 2022 18:14:47 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> One more ResourceMark fix; fixed gtest case > > Nice, I like the gtest. Thanks @coleenp and @matias9927 for the review and @tstuefe for the suggestions. ------------- PR: https://git.openjdk.org/jdk/pull/9957 From iklam at openjdk.org Mon Oct 24 22:20:13 2022 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 24 Oct 2022 22:20:13 GMT Subject: Integrated: 8292699: Improve printing of classes in native debugger In-Reply-To: References: Message-ID: On Sun, 21 Aug 2022 07:08:12 GMT, Ioi Lam wrote: > Current in gdb, you can print information about a class or method with something like > > > call ((InstanceKlass*)0x00000008000411b8)->print_on(tty) > call ((Method*)0x00007fffb4000d08)->print_codes_on(tty) > > > However, it's difficult to find a class or method by its name and print out its contents. > > This RFE adds 3 new functions in debug.cpp so you can easily find classes/methods and print out their contents. They all have a `flags` argument that controls the verbosity. > > - `findclass()`: class name only > - `findmethod()`: class name and method name > - `findmethod2()`: class name and method name/signature > > I also cleaned up `BytecodeTracer` to remove unnecessary complexity. > > Here are some examples: > > > (gdb) call findclass("java/lang/Object", 0) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > > (gdb) call findclass("java/lang/Object", 1) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000658 : ()V > 0x00007fffb40010f0 finalize : ()V > 0x00007fffb4000f00 wait0 : (J)V > 0x00007fffb40008e8 equals : (Ljava/lang/Object;)Z > 0x00007fffb4000aa0 toString : ()Ljava/lang/String; > 0x00007fffb40007f0 hashCode : ()I > 0x00007fffb4000720 getClass : ()Ljava/lang/Class; > 0x00007fffb40009a0 clone : ()Ljava/lang/Object; > 0x00007fffb4000b50 notify : ()V > 0x00007fffb4000c20 notifyAll : ()V > 0x00007fffb4000e50 wait : (J)V > 0x00007fffb4001028 wait : (JI)V > 0x00007fffb4000d08 wait : ()V > > (gdb) call findclass("*ClassLoader", 0) > [0] 0x000000080007de40 jdk.internal.loader.ClassLoaders$BootClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [1] 0x0000000800053c58 jdk.internal.loader.ClassLoaders$PlatformClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [2] 0x0000000800053918 jdk.internal.loader.ClassLoaders$AppClassLoader, loader data: 0x00007ffff0130d10 of 'bootstrap' > [....] > > (gdb) call findmethod2("*ang/Object*", "wait", "()V", 0x7) > [0] 0x00000008000411b8 java.lang.Object, loader data: 0x00007ffff0130d10 of 'bootstrap' > 0x00007fffb4000d08 wait : ()V > 0x00007fffb4000ce8 0 fast_aload_0 > 0x00007fffb4000ce9 1 lconst_0 > 0x00007fffb4000cea 2 invokevirtual 38 > 0x00007fffb4000ced 5 return This pull request has now been integrated. Changeset: 89dafc00 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/89dafc002f934f7381a150e3f04fd1f830d183a4 Stats: 559 lines in 16 files changed: 451 ins; 61 del; 47 mod 8292699: Improve printing of classes in native debugger Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/jdk/pull/9957 From sspitsyn at openjdk.org Mon Oct 24 23:28:46 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 24 Oct 2022 23:28:46 GMT Subject: RFR: 8295808: GrowableArray should support capacity management In-Reply-To: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: On Sat, 22 Oct 2022 01:38:44 GMT, Kim Barrett wrote: > Please review this change to GrowableArray to support capacity management. > Two functions are added to GrowableArray, reserve and shrink_to_fit. Also > renamed the max_length function to capacity. > > Used these new functions in StringDedupTable. > > Testing: mach5 tier1-3 src/hotspot/share/utilities/growableArray.hpp line 544: > 542: if (len > 0) { > 543: new_data = static_cast(this)->allocate(); > 544: for (int i = 0; i < len; ++i) ::new (&new_data[i]) E(old_data[i]); This can be a stupid question as I'm confused a little bit. Why do we reallocate memory for data elements? Could we just move the element pointers from the `old_data`? Then, of course, there would be no need to deallocate the moved data elements. ------------- PR: https://git.openjdk.org/jdk/pull/10827 From sviswanathan at openjdk.org Tue Oct 25 00:34:53 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 25 Oct 2022 00:34:53 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: > > extra whitespace character src/hotspot/cpu/x86/assembler_x86.cpp line 8306: > 8304: assert(dst != xnoreg, "sanity"); > 8305: InstructionMark im(this); > 8306: InstructionAttr attributes(vector_len, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true); no_mask_reg should be set to true here as we are not setting the mask register here. src/hotspot/cpu/x86/stubRoutines_x86.cpp line 83: > 81: address StubRoutines::x86::_join_2_3_base64 = NULL; > 82: address StubRoutines::x86::_decoding_table_base64 = NULL; > 83: address StubRoutines::x86::_poly1305_mask_addr = NULL; Please also update the copyright year to 2022 for stubRoutines_x86.cpp and hpp files. src/hotspot/cpu/x86/vm_version_x86.cpp line 925: > 923: _features &= ~CPU_AVX512_VBMI2; > 924: _features &= ~CPU_AVX512_BITALG; > 925: _features &= ~CPU_AVX512_IFMA; This should also be done under is_knights_family(). src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead > 174: // and not affect platforms without intrinsic support > 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; The ByteBuffer version can also benefit from this optimization if it has array as backing storage. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From fyang at openjdk.org Tue Oct 25 04:04:51 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Oct 2022 04:04:51 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v16] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Mon, 24 Oct 2022 13:49:01 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: > > Fix cbo_zero encoding src/hotspot/cpu/riscv/assembler_riscv.hpp line 2749: > 2747: INSN(prefetch_i, 0b0000000000000); > 2748: INSN(prefetch_r, 0b0000000000001); > 2749: INSN(prefetch_w, 0b0000000000010); Opcode for prefetch_w is wrong? ------------- PR: https://git.openjdk.org/jdk/pull/10718 From sspitsyn at openjdk.org Tue Oct 25 08:02:57 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 25 Oct 2022 08:02:57 GMT Subject: RFR: 8295808: GrowableArray should support capacity management In-Reply-To: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: On Sat, 22 Oct 2022 01:38:44 GMT, Kim Barrett wrote: > Please review this change to GrowableArray to support capacity management. > Two functions are added to GrowableArray, reserve and shrink_to_fit. Also > renamed the max_length function to capacity. > > Used these new functions in StringDedupTable. > > Testing: mach5 tier1-3 Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10827 From fyang at openjdk.org Tue Oct 25 08:09:57 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Oct 2022 08:09:57 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Mon, 24 Oct 2022 12:56:44 GMT, Vladimir Kempik wrote: > > I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. > > I strongly disagree. > > Until processor reports Zic64b supported we should assume the cache line size could be anything (2^N). I'm running some tests on risc-v fpga core and it has 16-bytes cache block size. having to patch ( change default cache line size) and rebuild openjdk to just try new Zicbo{z,p,m} opcodes ( once they are implemented) - isn't a nice way Now I see the concern here. I agree to have the an option for cache-line size in that respect. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From sjohanss at openjdk.org Tue Oct 25 08:11:17 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 25 Oct 2022 08:11:17 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 12:49:42 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 Looks good, just a comment/question on where to do this in the full GC. src/hotspot/share/gc/g1/g1FullGCCompactTask.cpp line 125: > 123: _claimer(collector->workers()) { > 124: // Need cleared claim bits for the next concurrent marking. > 125: ClassLoaderDataGraph::clear_claimed_marks(); Is there a good reason to do the clearing here instead of in `void G1FullCollector::complete_collection()`? ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.org/jdk/pull/10675 From tschatzl at openjdk.org Tue Oct 25 08:36:11 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Oct 2022 08:36:11 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - sjohanss review fixes - sjohanss review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10675/files - new: https://git.openjdk.org/jdk/pull/10675/files/2c2ce57e..4d12b230 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=00-01 Stats: 6 lines in 2 files changed: 4 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10675.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10675/head:pull/10675 PR: https://git.openjdk.org/jdk/pull/10675 From sjohanss at openjdk.org Tue Oct 25 08:41:43 2022 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 25 Oct 2022 08:41:43 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v2] In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 08:36:11 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? >> >> The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. >> >> I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. >> >> Testing: tier1-5 > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - sjohanss review fixes > - sjohanss review Thanks, please revert the changes to `g1FullGCCompactTask.*` or at least remove the additional include in the cpp-file before pushing. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.org/jdk/pull/10675 From luhenry at openjdk.org Tue Oct 25 09:23:51 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 25 Oct 2022 09:23:51 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v11] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> <28cw2YaEFxcfjnViCv7Pl8Jk-s8euXhm8NECiS197ig=.1e892088-27fa-4b27-9445-772b7228c8ba@github.com> Message-ID: On Tue, 25 Oct 2022 08:07:21 GMT, Fei Yang wrote: >>> I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. >> >> I strongly disagree. >> >> Until processor reports Zic64b supported we should assume the cache line size could be anything (2^N). I'm running some tests on risc-v fpga core and it has 16-bytes cache block size. >> having to patch ( change default cache line size) and rebuild openjdk to just try new Zicbo{z,p,m} opcodes ( once they are implemented) - isn't a nice way > >> > I overall agree with the 64 bytes is an industry standard. What I want to ensure is that we can eagerly enable `UseBlockZeroing` whenever we can. >> >> I strongly disagree. >> >> Until processor reports Zic64b supported we should assume the cache line size could be anything (2^N). I'm running some tests on risc-v fpga core and it has 16-bytes cache block size. having to patch ( change default cache line size) and rebuild openjdk to just try new Zicbo{z,p,m} opcodes ( once they are implemented) - isn't a nice way > > Now I see the concern here. I agree to have the an option for cache-line size in that respect. I'll revert to what I had to set CacheLineSize then. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Tue Oct 25 09:40:13 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 25 Oct 2022 09:40:13 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v17] In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero Ludovic Henry has updated the pull request incrementally with three additional commits since the last revision: - cleanup - Revert "Enable Block Zeroing when Usez64b is enabled only" This reverts commit 021f53d9ec3d87be2d85b3a159db30e5960c47e0. - Fix prefetch_w encoding Reference at: https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.i.adoc https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.r.adoc https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.w.adoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10718/files - new: https://git.openjdk.org/jdk/pull/10718/files/3156dc9c..0a12b2c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10718&range=15-16 Stats: 28 lines in 5 files changed: 16 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/10718.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10718/head:pull/10718 PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Tue Oct 25 10:01:56 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 25 Oct 2022 10:01:56 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v16] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Tue, 25 Oct 2022 04:00:59 GMT, Fei Yang wrote: >> Ludovic Henry has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix cbo_zero encoding > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 2749: > >> 2747: INSN(prefetch_i, 0b0000000000000); >> 2748: INSN(prefetch_r, 0b0000000000001); >> 2749: INSN(prefetch_w, 0b0000000000010); > > Opcode for prefetch_w is wrong? Fixed. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From tschatzl at openjdk.org Tue Oct 25 10:05:58 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Oct 2022 10:05:58 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v3] In-Reply-To: References: Message-ID: <6wrg4yYuD62sBxvoIg3BDpOWmmSsOP7kuxd7JJEsA_s=.38cb889e-f860-457f-a281-afbf10de7e35@github.com> > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: sjohanss review2, remove unnecessary changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10675/files - new: https://git.openjdk.org/jdk/pull/10675/files/4d12b230..293d3af3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=01-02 Stats: 12 lines in 2 files changed: 2 ins; 8 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10675.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10675/head:pull/10675 PR: https://git.openjdk.org/jdk/pull/10675 From tschatzl at openjdk.org Tue Oct 25 10:14:50 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Oct 2022 10:14:50 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v4] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10675/files - new: https://git.openjdk.org/jdk/pull/10675/files/293d3af3..be0b9cf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=02-03 Stats: 5 lines in 2 files changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10675.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10675/head:pull/10675 PR: https://git.openjdk.org/jdk/pull/10675 From tschatzl at openjdk.org Tue Oct 25 10:17:20 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Oct 2022 10:17:20 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v5] In-Reply-To: References: Message-ID: <0wSWwLYUzzRvT1uKfTQH8qwEE7hz5w6SRSnxQy-wuGc=.94a048d3-baa1-400b-94a9-68d56a1a4ed3@github.com> > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review2, add verification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10675/files - new: https://git.openjdk.org/jdk/pull/10675/files/be0b9cf3..c15f0a73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10675&range=03-04 Stats: 6 lines in 1 file changed: 4 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10675.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10675/head:pull/10675 PR: https://git.openjdk.org/jdk/pull/10675 From ayang at openjdk.org Tue Oct 25 10:51:54 2022 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 25 Oct 2022 10:51:54 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v5] In-Reply-To: <0wSWwLYUzzRvT1uKfTQH8qwEE7hz5w6SRSnxQy-wuGc=.94a048d3-baa1-400b-94a9-68d56a1a4ed3@github.com> References: <0wSWwLYUzzRvT1uKfTQH8qwEE7hz5w6SRSnxQy-wuGc=.94a048d3-baa1-400b-94a9-68d56a1a4ed3@github.com> Message-ID: On Tue, 25 Oct 2022 10:17:20 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? >> >> The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. >> >> I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. >> >> Testing: tier1-5 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review2, add verification Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10675 From fyang at openjdk.org Tue Oct 25 12:47:57 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 25 Oct 2022 12:47:57 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v17] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <42xVw8227v4zRfPeqCSYCQFhyxIieFdSPouDTLrlQKQ=.378f9a41-2ec2-4b69-9327-8fa9bea1c8be@github.com> On Tue, 25 Oct 2022 09:40:13 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with three additional commits since the last revision: > > - cleanup > - Revert "Enable Block Zeroing when Usez64b is enabled only" > > This reverts commit 021f53d9ec3d87be2d85b3a159db30e5960c47e0. > - Fix prefetch_w encoding > > Reference at: > https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.i.adoc > https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.r.adoc > https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.w.adoc Updated change looks good. Thanks for the effort. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10718 From luhenry at openjdk.org Tue Oct 25 13:02:13 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 25 Oct 2022 13:02:13 GMT Subject: RFR: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V [v17] In-Reply-To: References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: On Tue, 25 Oct 2022 09:40:13 GMT, Ludovic Henry wrote: >> Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. >> >> [1] https://github.com/riscv/riscv-CMOs >> [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero > > Ludovic Henry has updated the pull request incrementally with three additional commits since the last revision: > > - cleanup > - Revert "Enable Block Zeroing when Usez64b is enabled only" > > This reverts commit 021f53d9ec3d87be2d85b3a159db30e5960c47e0. > - Fix prefetch_w encoding > > Reference at: > https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.i.adoc > https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.r.adoc > https://github.com/riscv/riscv-CMOs/blob/master/cmobase/insns/prefetch.w.adoc Failures in `linux-cross-compile` are due to a failure to install gcc. It builds and pass `hotspot:tier1_compiler` tests locally. ------------- PR: https://git.openjdk.org/jdk/pull/10718 From aph at openjdk.org Tue Oct 25 13:33:56 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Oct 2022 13:33:56 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic I now have some performance results. `java.lang.foreign.CallOverheadConstant` is the test that I used to measure JNI overhead. At present, without `-XX:+RestoreMXCSROnJNICalls`, it looks like this: Benchmark Mode Cnt Score Error Units CallOverheadConstant.jni_blank avgt 40 9.968 ? 0.037 ns/op CallOverheadConstant.panama_blank avgt 40 8.745 ? 0.012 ns/op Enabling `-XX:+RestoreMXCSROnJNICalls` makes the overhead much worse: Benchmark Mode Cnt Score Error Units CallOverheadConstant.jni_blank avgt 40 14.741 ? 0.031 ns/op CallOverheadConstant.panama_blank avgt 40 14.620 ? 0.022 ns/op and with JMH perfasm we can see why: 0x00007f9f43d5698d: sub rsp,0x8 1.56% 0x00007f9f43d56991: vstmxcsr DWORD PTR [rsp] 25.01% 0x00007f9f43d56996: mov eax,DWORD PTR [rsp] 11.09% 0x00007f9f43d56999: and eax,0xffc0 0x00007f9f43d5699e: cmp eax,DWORD PTR [rip+0xe02d234] # 0x00007f9f51d83bd8 That adds 50% to the total JNI overhead. 70% to the Panama overhead. 25% of the total elapsed time is MXCSR! Reading MXCSR is expensive. So we don't do that. So, after a lot of head scratching, I've invented an instruction sequence which doesn't read MXCSR but does a little arithmetic, and `-XX:+RestoreMXCSROnJNICalls` is: CallOverheadConstant.jni_blank avgt 40 10.675 ? 0.100 ns/op CallOverheadConstant.panama_blank avgt 40 10.284 ? 0.018 ns/op Which is 7% added overhead for JNI, 17% for Panama. 1ns is 3.5 machine cycles: that's a bit less than the latency of a load from L1 cache. I'm wondering if I could get away with fixing `RestoreMXCSROnJNICalls` and turning it on by default. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Tue Oct 25 14:27:31 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Oct 2022 14:27:31 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Thu, 20 Oct 2022 20:26:47 GMT, Vladimir Ivanov wrote: > The GCC bugs with `-ffast-math` only corrupts `FTZ` and `DAZ`. > > But `RC` and exception masks may be corrupted as well the same way and I believe the consequences are be similar (silent divergence in results during FP computations). I think we can catch the things that are likely, and will result in silent corruption. We should limit this, I think, to rounding modes and denormals-to-zero. I don't think we should bother with exception masks. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Tue Oct 25 14:35:34 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Oct 2022 14:35:34 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 25 Oct 2022 13:29:52 GMT, Andrew Haley wrote: > Enabling `-XX:+RestoreMXCSROnJNICalls` makes the overhead much worse: > > ``` > Benchmark Mode Cnt Score Error Units > CallOverheadConstant.jni_blank avgt 40 14.741 ? 0.031 ns/op > CallOverheadConstant.panama_blank avgt 40 14.620 ? 0.022 ns/op > ``` This is Zen+, by the way: Latency of stmxcsr ?18. Better than some x86_64, worse than others. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From fweimer at openjdk.org Tue Oct 25 14:50:51 2022 From: fweimer at openjdk.org (Florian Weimer) Date: Tue, 25 Oct 2022 14:50:51 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Sorry, I feel like this has gone a bit off track. It started as some hardening for `loadLibrary`, but now it's about making all JNI calls a bit slower? Is there any data to suggest that this is necessary? Would it be possible to capture some FPU state evidence in crash dumps, as an alternative? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Tue Oct 25 15:02:48 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Oct 2022 15:02:48 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <0qwgnGBvOdE2FUIyQpm_XOfwrN5vcR__ttweS3M7PeI=.df954f56-7c6a-4c5b-9db2-d02d1ddb5042@github.com> On Tue, 25 Oct 2022 14:46:57 GMT, Florian Weimer wrote: > Sorry, I feel like this has gone a bit off track. It started as some hardening for `loadLibrary`, but now it's about making all JNI calls a bit slower? Is there any data to suggest that this is necessary? > > Would it be possible to capture some FPU state evidence in crash dumps, as an alternative? Really? The problem is that when certain libraries are loaded, we get silently corrupted results. Vladimir Ivanov pointed out that the weakness applies to any JNI call, and we wondered if it might be possible to make the workaround so cheap that we could leave it on by default. IMVHO we probably could: the additional overhead is about 1-1.5ns. The only way to measure it is carefully written tests against a JNI call that does nothing. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Tue Oct 25 15:10:46 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Oct 2022 15:10:46 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic And this is Xeon CPU E5-2430 (Ivy Bridge-EN) @ 2.20GHz, with my `-XX:+RestoreMXCSROnJNICalls` code: Benchmark Mode Cnt Score Error Units CallOverheadConstant.jni_blank avgt 40 16.669 ? 0.011 ns/op CallOverheadConstant.panama_blank avgt 40 15.262 ? 0.052 ns/op Benchmark Mode Cnt Score Error Units CallOverheadConstant.jni_blank avgt 40 18.015 ? 1.671 ns/op CallOverheadConstant.panama_blank avgt 40 16.658 ? 0.566 ns/op ------------- PR: https://git.openjdk.org/jdk/pull/10661 From aph at openjdk.org Tue Oct 25 15:10:47 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Oct 2022 15:10:47 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <6lFJnHoENcGxQteeh2YMUl3C97dmey32CR7rrX52T0Q=.43bb9040-097f-44c1-9b9a-12ff997575e0@github.com> On Tue, 18 Oct 2022 07:46:35 GMT, Florian Weimer wrote: > I wonder if something that focuses on diagnostic tools might be better here, particularly if there hasn't been any reported breakage. The `dlopen` protection is of course very incomplete because any JNI call can change the state in unexpected ways. > > On the other hand, it seems unlikely that this change breaks some undefined but intended use of the FPU state because if it is changed in `dlopen`, it's not going to be propagated across threads. Why is that unlikely? `System.loadLibrary` runs on a Java thread. That kinda makes it worse, because arithmetic results are different depending on which thread is in use. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From tschatzl at openjdk.org Tue Oct 25 16:21:46 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Oct 2022 16:21:46 GMT Subject: RFR: 8295118: G1: Clear CLD claim marks concurrently [v5] In-Reply-To: <0wSWwLYUzzRvT1uKfTQH8qwEE7hz5w6SRSnxQy-wuGc=.94a048d3-baa1-400b-94a9-68d56a1a4ed3@github.com> References: <0wSWwLYUzzRvT1uKfTQH8qwEE7hz5w6SRSnxQy-wuGc=.94a048d3-baa1-400b-94a9-68d56a1a4ed3@github.com> Message-ID: On Tue, 25 Oct 2022 10:17:20 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? >> >> The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. >> >> I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. >> >> Testing: tier1-5 > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review2, add verification Another tier1-5 run seems okay. Thanks @walulyai @albertnetymk @kstefanj for your reviews. ------------- PR: https://git.openjdk.org/jdk/pull/10675 From tschatzl at openjdk.org Tue Oct 25 16:23:59 2022 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Oct 2022 16:23:59 GMT Subject: Integrated: 8295118: G1: Clear CLD claim marks concurrently In-Reply-To: References: Message-ID: On Wed, 12 Oct 2022 12:49:42 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that moves out clearing CLD marks from the concurrent start pause to the concurrent phase? > > The idea is that instead of clearing CLD marks just before marking through, clear the marks at the end of the concurrent phases (or at the end of the full gc) so that after that operation marks are reset. > > I believe that one can save one of the `ClassLoaderDataGraph::clear_claimed_marks` in full gc by using different claim values (we need the one at the beginning and the end though), but the overhead of that should be minimal compared to actual full gc time. > > Testing: tier1-5 This pull request has now been integrated. Changeset: 5c4d99a0 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/5c4d99a05185cc5fc41691fd62102f3b5bbefc50 Stats: 56 lines in 9 files changed: 17 ins; 29 del; 10 mod 8295118: G1: Clear CLD claim marks concurrently Reviewed-by: iwalulya, sjohanss, ayang ------------- PR: https://git.openjdk.org/jdk/pull/10675 From kbarrett at openjdk.org Tue Oct 25 17:32:57 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Oct 2022 17:32:57 GMT Subject: RFR: 8295808: GrowableArray should support capacity management In-Reply-To: References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: On Mon, 24 Oct 2022 08:11:42 GMT, Axel Boldt-Christmas wrote: > Any thoughts of moving `expand_to` and `shrink_to_fit` to a common function, given that they share a lot of logic. Something like a general resize that [...] You are right that there are some opportunities for factoring out common code, though probably not as a general resize operation. (And that name already has a different meaning for std::vector, so I would probably look for a different name anyway.) I'm going to address that as a followup issue. ------------- PR: https://git.openjdk.org/jdk/pull/10827 From kbarrett at openjdk.org Tue Oct 25 17:43:37 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Oct 2022 17:43:37 GMT Subject: RFR: 8295808: GrowableArray should support capacity management [v2] In-Reply-To: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: > Please review this change to GrowableArray to support capacity management. > Two functions are added to GrowableArray, reserve and shrink_to_fit. Also > renamed the max_length function to capacity. > > Used these new functions in StringDedupTable. > > Testing: mach5 tier1-3 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into ga-capacity - copyrights - use reserve/shrink_to_fit in StringDedupTable - gtests for capacity functions - add reserve and shrink_to_fit - max_length() => capacity() - initial_capacity => capacity - capacity nomenclature ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10827/files - new: https://git.openjdk.org/jdk/pull/10827/files/08f119ca..aac22e47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10827&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10827&range=00-01 Stats: 115329 lines in 565 files changed: 80485 ins; 5038 del; 29806 mod Patch: https://git.openjdk.org/jdk/pull/10827.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10827/head:pull/10827 PR: https://git.openjdk.org/jdk/pull/10827 From kbarrett at openjdk.org Tue Oct 25 17:43:38 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Oct 2022 17:43:38 GMT Subject: RFR: 8295808: GrowableArray should support capacity management [v2] In-Reply-To: References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: On Mon, 24 Oct 2022 08:11:42 GMT, Axel Boldt-Christmas wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into ga-capacity >> - copyrights >> - use reserve/shrink_to_fit in StringDedupTable >> - gtests for capacity functions >> - add reserve and shrink_to_fit >> - max_length() => capacity() >> - initial_capacity => capacity >> - capacity nomenclature > > LGTM. > > Any thoughts of moving `expand_to` and `shrink_to_fit` to a common function, given that they share a lot of logic. > Something like a general resize that > * Allocates if `new_capacity != 0` > * Copy constructs new elements from `0` to `min(old_capacity, new_capacity)` > * Default constructs from `min(old_capacity, new_capacity)` to `new_capacity` > * Destroy old elements from `0` to `old_capacity` Thanks @xmas92 , @tschatzl , and @sspitsyn for reviews. ------------- PR: https://git.openjdk.org/jdk/pull/10827 From kbarrett at openjdk.org Tue Oct 25 17:46:46 2022 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Oct 2022 17:46:46 GMT Subject: Integrated: 8295808: GrowableArray should support capacity management In-Reply-To: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> References: <1j-mFLfOtV7qwFjaFUao1eDe2cSjXvSHCAZEKS5JbaA=.81c5de03-d403-4d7c-8318-04d7bee9462a@github.com> Message-ID: On Sat, 22 Oct 2022 01:38:44 GMT, Kim Barrett wrote: > Please review this change to GrowableArray to support capacity management. > Two functions are added to GrowableArray, reserve and shrink_to_fit. Also > renamed the max_length function to capacity. > > Used these new functions in StringDedupTable. > > Testing: mach5 tier1-3 This pull request has now been integrated. Changeset: 3a873d3c Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/3a873d3c5b2281b2389e9364ff26f04ee86b0607 Stats: 195 lines in 8 files changed: 90 ins; 19 del; 86 mod 8295808: GrowableArray should support capacity management Reviewed-by: aboldtch, tschatzl, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/10827 From iveresov at openjdk.org Tue Oct 25 20:00:26 2022 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 25 Oct 2022 20:00:26 GMT Subject: RFR: 8295066: Folding of loads is broken in C2 after JDK-8242115 Message-ID: The fix does two things: 1. Allow folding of pinned loads to constants with a straight line data flow (no phis). 2. Make scalarization aware of the new shape of the barriers so that pre-loads can be ignored. Testing is clean, Valhalla testing is clean too. ------------- Commit messages: - Add test - Fix scalarization - Allow direct constant folding Changes: https://git.openjdk.org/jdk/pull/10861/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10861&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295066 Stats: 260 lines in 9 files changed: 178 ins; 46 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/10861.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10861/head:pull/10861 PR: https://git.openjdk.org/jdk/pull/10861 From luhenry at openjdk.org Tue Oct 25 20:15:53 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 25 Oct 2022 20:15:53 GMT Subject: Integrated: 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V In-Reply-To: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> References: <29V6oUISogNjtVLepzSXpMg9PQPjjd3N7J5PiIgDoiU=.4abb09f7-a3a2-4163-ac76-ed7854bd4e8b@github.com> Message-ID: <_-VIL85f1FYdZKu6RMAUiKT7bDpJxluDwqJfyNkunXU=.5f924bb6-1c1a-4e99-8807-cc81008d8ab3@github.com> On Sat, 15 Oct 2022 14:23:01 GMT, Ludovic Henry wrote: > Similarly to AArch64 DC.ZVA, the RISC-V Zicboz [1] extension provides the cbo.zero [2] instruction that allows to zero out memory a cache-line at a time. This should be faster than storing zeroes 64bits at a time. > > [1] https://github.com/riscv/riscv-CMOs > [2] https://github.com/riscv/riscv-CMOs/blob/master/cmobase/Zicboz.adoc#insns-cbo_zero This pull request has now been integrated. Changeset: e0c29307 Author: Ludovic Henry Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/e0c29307f7b35149aacae0bb935aa9fe524cff72 Stats: 202 lines in 9 files changed: 186 ins; 3 del; 13 mod 8295282: Use Zicboz/cbo.zero to zero-out memory on RISC-V Reviewed-by: yadongwang, vkempik, fyang ------------- PR: https://git.openjdk.org/jdk/pull/10718 From matsaave at openjdk.org Tue Oct 25 21:29:40 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 25 Oct 2022 21:29:40 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries Message-ID: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. The text format and contents are tentative, please review. ------------- Commit messages: - fixed last trailing whitespace - Removed trailing whitespace - Fixed merge conflicts - 8295893: Improve printing of Constant Pool Cache Entries Changes: https://git.openjdk.org/jdk/pull/10860/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295893 Stats: 46 lines in 1 file changed: 24 ins; 3 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From coleenp at openjdk.org Tue Oct 25 21:42:21 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Oct 2022 21:42:21 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Tue, 25 Oct 2022 19:37:12 GMT, Matias Saavedra Silva wrote: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. Can you post an example how it looks for method and field entries ? ------------- PR: https://git.openjdk.org/jdk/pull/10860 From kvn at openjdk.org Tue Oct 25 22:06:17 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 25 Oct 2022 22:06:17 GMT Subject: RFR: 8295066: Folding of loads is broken in C2 after JDK-8242115 In-Reply-To: References: Message-ID: <5VWY6hlnoGyt8nqJMnX14qp7bpCvm4G1enchLM6NGT8=.f3a1b91d-12fb-4422-99ff-cc0dcbf669c5@github.com> On Tue, 25 Oct 2022 19:50:10 GMT, Igor Veresov wrote: > The fix does two things: > > 1. Allow folding of pinned loads to constants with a straight line data flow (no phis). > 2. Make scalarization aware of the new shape of the barriers so that pre-loads can be ignored. > > Testing is clean, Valhalla testing is clean too. Looks good. Please, test full first 3 tier1-3 (not just hs-tier*). ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10861 From jnimeh at openjdk.org Tue Oct 25 22:09:49 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Tue, 25 Oct 2022 22:09:49 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: > > extra whitespace character src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 171: > 169: } > 170: > 171: if (len >= 1024) { Out of curiosity, do you have any perf numbers for the impact of this change on systems that do not support AVX512? Does this help or hurt (or make a negligible impact) on poly1305 updates when the input is 1K or larger? src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 296: > 294: keyBytes[12] &= (byte)252; > 295: > 296: // This should be enabled, but Poly1305KAT would fail I'm on the fence about this change. I have no problem with it in basic terms. If we ever decided to make this a general purpose Mac in JCE then this would definitely be good to do. As of right now, the only consumer is ChaCha20 and it would submit a key through the process in the RFC. Seems really unlikely to run afoul of these checks, but admittedly not impossible. I would agree with @sviswa7 that we could examine this in a separate change and we could look at other approaches to getting around the KAT issue, perhaps some package-private based way to disable the check. As long as Poly1305 remains with package-private visibility, one could make another form of the constructor with a boolean that would disable this check and that is the constructor that the KAT would use. This is just an off-the-cuff idea, but one way we might get the best of both worlds. If we move this down the road then we should remove the commenting. We can refer back to this PR later. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From matsaave at openjdk.org Tue Oct 25 22:27:20 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 25 Oct 2022 22:27:20 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Tue, 25 Oct 2022 19:37:12 GMT, Matias Saavedra Silva wrote: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return Calling findmethod will result in this output for an invokedynamic call: (gdb) call findmethod("Concat0", "main", 0x8) "Executing findmethod" flags (bitmask): 0x01 - print names of methods 0x02 - print bytecodes 0x04 - print the address of bytecodes 0x08 - print info for invokedynamic 0x10 - print info for invokehandle [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V 0 iconst_0 1 istore_1 2 iload_1 3 iconst_2 4 if_icmpge 24 7 getstatic 7 10 invokedynamic bsm=31 13 BSM: REF_invokeStatic 32 arguments[1] = { 000 } ConstantPoolCacheEntry: 4 - this: 0x00007fffa0400570 - bytecode 1: invokedynamic ba - bytecode 2: nop 00 - cp index: 13 - F1: [ 0x00000008000c8658] - F2: [ 0x0000000000000003] - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] - tos: object - local signature: 1 - has appendix: 1 - forced virtual: 0 - final: 1 - virtual Final: 0 - resolution Failed: 0 - num Parameters: 02 Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; appendix: java.lang.invoke.BoundMethodHandle$Species_LL {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' - ---- fields (total size 5 words): - private 'customizationCount' 'B' @12 0 (0x00) - private volatile 'updateInProgress' 'Z' @13 false (0x00) - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) ------------- 15 putstatic 17 18 iinc #1 1 21 goto 2 24 return You can see that the flag value is currently decoded twice, once as a quick translation and once in full detail. Printing will look nearly identical for a field but with different flags and no appendix. ------------- PR: https://git.openjdk.org/jdk/pull/10860 From dholmes at openjdk.org Tue Oct 25 22:44:42 2022 From: dholmes at openjdk.org (David Holmes) Date: Tue, 25 Oct 2022 22:44:42 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Tue, 25 Oct 2022 19:37:12 GMT, Matias Saavedra Silva wrote: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return Hi Matias, A quick skim through showed a couple of minor issues to fix. I don't really use this output so I can't say whether the expanded format is helpful or too verbose (there are certainly a lot more lines printed now). Thanks. src/hotspot/share/oops/cpCache.cpp line 637: > 635: st->print_cr(" - F1: [ " PTR_FORMAT "]", (intptr_t)_f1); > 636: st->print_cr(" - F2: [ " PTR_FORMAT "]", (intptr_t)_f2); > 637: st->print_cr(" - Method: " INTPTR_FORMAT " %s", p2i(m), m->external_name()); You need a NULL check on m src/hotspot/share/oops/cpCache.cpp line 657: > 655: p2i(m), > 656: m->method_holder()->name()->as_C_string(), > 657: m->name()->as_C_string(), m->signature()->as_C_string()); You removed the ResourceMark needed by `as_C_string` so they won't get cleaned up at this level (and will cause a failure if there isn't a RM somewhere in the call stack). src/hotspot/share/oops/cpCache.cpp line 665: > 663: } > 664: } > 665: else if (is_field_entry()) { Is there a third alternative? ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10860 From lmesnik at openjdk.org Tue Oct 25 22:50:29 2022 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 25 Oct 2022 22:50:29 GMT Subject: Integrated: 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. In-Reply-To: References: Message-ID: On Tue, 11 Oct 2022 19:00:11 GMT, Leonid Mesnik wrote: > The fix removes nsk/jvmti/ tests ported to serviceability/jvmti and forward-ports corresponding fixed. The suspend/resume tests require more work covered by https://bugs.openjdk.org/browse/JDK-8295169. This pull request has now been integrated. Changeset: 3bd3caf8 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/3bd3caf897dcb6d53fae6e94ba1cc281b30277ea Stats: 23360 lines in 227 files changed: 50 ins; 23293 del; 17 mod 8294486: Remove vmTestbase/nsk/jvmti/ tests ported to serviceability/jvmti. Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/10665 From vlivanov at openjdk.org Tue Oct 25 23:15:27 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 25 Oct 2022 23:15:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <_5t8ZR59czZvHg5wpsd24vGRiKy_unwr5Po1nrq7hec=.8da78b9f-e55a-40e3-bde3-214df1cffbd8@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Thanks a lot for the data, Andrew. I agree with you that the overhead `stmxcsr` introduces makes it almost prohibitive to be unconditionally placed on the hot path on the way back from a JNI call. In contrast, your idea to detect the corruption by checking the result of a carefully chosen FP expression has very modest impact while still being able to catch important types of MXCSR corruption. I fully support having it turned on by default for JNI calls. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From sviswanathan at openjdk.org Tue Oct 25 23:52:26 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 25 Oct 2022 23:52:26 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: > > extra whitespace character src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 806: > 804: evmovdquq(A0, Address(rsp, 64*0), Assembler::AVX_512bit); > 805: evmovdquq(A0, Address(rsp, 64*1), Assembler::AVX_512bit); > 806: evmovdquq(A0, Address(rsp, 64*2), Assembler::AVX_512bit); This is load from stack into A0. Did you intend to store A0 (cleanup) into stack local area here? I think the source and destination are mixed up here. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From david.holmes at oracle.com Wed Oct 26 01:53:17 2022 From: david.holmes at oracle.com (David Holmes) Date: Wed, 26 Oct 2022 11:53:17 +1000 Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <0qwgnGBvOdE2FUIyQpm_XOfwrN5vcR__ttweS3M7PeI=.df954f56-7c6a-4c5b-9db2-d02d1ddb5042@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <0qwgnGBvOdE2FUIyQpm_XOfwrN5vcR__ttweS3M7PeI=.df954f56-7c6a-4c5b-9db2-d02d1ddb5042@github.com> Message-ID: <18a7522c-c444-f908-d06d-681a9816cec5@oracle.com> On 26/10/2022 1:02 am, Andrew Haley wrote: > On Tue, 25 Oct 2022 14:46:57 GMT, Florian Weimer wrote: > >> Sorry, I feel like this has gone a bit off track. It started as some hardening for `loadLibrary`, but now it's about making all JNI calls a bit slower? Is there any data to suggest that this is necessary? >> >> Would it be possible to capture some FPU state evidence in crash dumps, as an alternative? > > Really? The problem is that when certain libraries are loaded, we get silently corrupted results. Vladimir Ivanov pointed out that the weakness applies to any JNI call, and we wondered if it might be possible to make the workaround so cheap that we could leave it on by default. IMVHO we probably could: the additional overhead is about 1-1.5ns. The only way to measure it is carefully written tests against a JNI call that does nothing. The loadlibrary issue is a concrete issue with a simple and localised solution. The extension to "well this could potentially happen on any JNI call if it messed with FP state" is a theoretical problem that has always been there. If it were free to fix then sure lets be super conservative, but it seems to me we are going to penalize everyone (and we just went to some effort to produce extremely fast trivial JNI calls) to account for something nobody has any evidence is happening - and if it did happen then the library being used should be fixed. I don't see why we should make everyone pay for this "just in case". At most an expanded -Xcheck:jni check for FP-state manipulation, with an enhanced/fixed RestoreMXCSROnJNICalls is in order IMVHO. YMMV. Cheers, David ----- > ------------- > > PR: https://git.openjdk.org/jdk/pull/10661 From jrose at openjdk.org Wed Oct 26 02:27:29 2022 From: jrose at openjdk.org (John R Rose) Date: Wed, 26 Oct 2022 02:27:29 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <_5t8ZR59czZvHg5wpsd24vGRiKy_unwr5Po1nrq7hec=.8da78b9f-e55a-40e3-bde3-214df1cffbd8@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <_5t8ZR59czZvHg5wpsd24vGRiKy_unwr5Po1nrq7hec=.8da78b9f-e55a-40e3-bde3-214df1cffbd8@github.com> Message-ID: On Tue, 25 Oct 2022 23:13:08 GMT, Vladimir Ivanov wrote: > ?very modest impact while still being able to catch important types of MXCSR corruption. I fully support having it turned on by default for JNI calls. I guess I agree. With the clever test for the bad mode Java cares about, the overhead is small compared to an empty JNI call, and very small compared to any normally non-empty JNI call. Now I'm curious: What's this magic code? Does it multiply a couple of well-chosen constants and test for zero? I said "I guess" because I'm not clear on (a) the benefit of adding that nanosecond (other than preserving denorms against a rare system fault), nor on (b) what are the remaining faults not checked for, but which a more expensive MSR spill/restore would fix. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From iveresov at openjdk.org Wed Oct 26 04:19:23 2022 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 26 Oct 2022 04:19:23 GMT Subject: RFR: 8295066: Folding of loads is broken in C2 after JDK-8242115 In-Reply-To: <5VWY6hlnoGyt8nqJMnX14qp7bpCvm4G1enchLM6NGT8=.f3a1b91d-12fb-4422-99ff-cc0dcbf669c5@github.com> References: <5VWY6hlnoGyt8nqJMnX14qp7bpCvm4G1enchLM6NGT8=.f3a1b91d-12fb-4422-99ff-cc0dcbf669c5@github.com> Message-ID: On Tue, 25 Oct 2022 22:02:54 GMT, Vladimir Kozlov wrote: > Please, test full first 3 tier1-3 (not just hs-tier*). Done. Looks good. ------------- PR: https://git.openjdk.org/jdk/pull/10861 From kvn at openjdk.org Wed Oct 26 04:47:24 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 26 Oct 2022 04:47:24 GMT Subject: RFR: 8295066: Folding of loads is broken in C2 after JDK-8242115 In-Reply-To: References: <5VWY6hlnoGyt8nqJMnX14qp7bpCvm4G1enchLM6NGT8=.f3a1b91d-12fb-4422-99ff-cc0dcbf669c5@github.com> Message-ID: On Wed, 26 Oct 2022 04:15:39 GMT, Igor Veresov wrote: > > Please, test full first 3 tier1-3 (not just hs-tier*). > > Done. Looks good. Thank you for running them. ------------- PR: https://git.openjdk.org/jdk/pull/10861 From xlinzheng at openjdk.org Wed Oct 26 05:05:26 2022 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 26 Oct 2022 05:05:26 GMT Subject: RFR: 8295646: Ignore zero pairs in address descriptors read by dwarf parser [v2] In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 12:11:30 GMT, Xiaolin Zheng wrote: >> RISC-V generates debuginfo like >> >> >>> readelf --debug-dump=aranges build/linux-riscv64-server-fastdebug/images/test/hotspot/gtest/server/libjvm.so >> >> ... >> Length: 1756 >> Version: 2 >> Offset into .debug_info: 0x4bc5e9 >> Pointer Size: 8 >> Segment Size: 0 >> >> Address Length >> 0000000000344ece 0000000000004a2c >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 <= >> 00000000003498fa 0000000000000016 >> 0000000000349910 0000000000000016 >> .... >> 000000000026d5b8 0000000000000b9a >> 000000000034a532 0000000000000628 >> 000000000034ab5a 00000000000002ac >> 0000000000000000 0000000000000000 <= >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> 000000000034ae06 0000000000000bee >> 000000000034b9f4 0000000000000660 >> 000000000034c054 00000000000005aa >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 <= >> 000000000034c5fe 0000000000000af2 >> 000000000034d0f0 0000000000000f16 >> 000000000034e006 0000000000000b4a >> 0000000000000000 0000000000000000 >> 0000000000000000 0000000000000000 >> 000000000026e152 000000000000000e >> 0000000000000000 0000000000000000 >> >> >> Our dwarf parser (gdb's dwarf parser before this April is as well [1], which encountered the same issue on RISC-V) uses `address == 0 && size == 0` in `is_terminating_entry()` to detect terminations of an arange section, which will early terminate parsing RISC-V's debuginfo at an "apparent terminator" described in [1] so that the result would not look correct with tests failures. The `_header._unit_length` is read but not used and it is the real length that can determine the section's end, so we can use it to get the end position of a section instead of `address == 0 && size == 0` checks to fix this issue. >> >> Also, the reason why `readelf` has no such issue is it also uses the same approach to determine the end position. [2] >> >> Tests added along with the dwarf parser patch are all tested and passed on x86_64, aarch64, and riscv64. >> Running a tier1 sanity test now. >> >> Thanks, >> Xiaolin >> >> [1] https://github.com/bminor/binutils-gdb/commit/1a7c41d5ece7d0d1aa77d8019ee46f03181854fa >> [2] https://github.com/bminor/binutils-gdb/blob/fd320c4c29c9a1915d24a68a167a5fd6d2c27e60/binutils/dwarf.c#L7594 > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Add the assertion back I may assume this one could go, with the approval from the code author himself? :-) ------------- PR: https://git.openjdk.org/jdk/pull/10758 From thartmann at openjdk.org Wed Oct 26 05:24:23 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 26 Oct 2022 05:24:23 GMT Subject: RFR: 8295066: Folding of loads is broken in C2 after JDK-8242115 In-Reply-To: References: Message-ID: <8rFROVmvN4pO0mGVlXs48VNkJ1c0D7UpiBarJIz7QJg=.31693a6d-f908-473b-bedb-f7cd824efb63@github.com> On Tue, 25 Oct 2022 19:50:10 GMT, Igor Veresov wrote: > The fix does two things: > > 1. Allow folding of pinned loads to constants with a straight line data flow (no phis). > 2. Make scalarization aware of the new shape of the barriers so that pre-loads can be ignored. > > Testing is clean, Valhalla testing is clean too. That looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10861 From fyang at openjdk.org Wed Oct 26 08:19:22 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 26 Oct 2022 08:19:22 GMT Subject: RFR: 8295711: Rename ZBarrierSetAssembler::load_at parameter name from "tmp_thread" to "tmp2" In-Reply-To: References: Message-ID: <1To4XVL4EaSqJldh_9z2aF5OXLA0cHzXEW-rXKSV8ec=.9abe5aac-2936-4b0e-a965-2f6ce105cbcf@github.com> On Fri, 21 Oct 2022 07:06:31 GMT, Fei Yang wrote: > > Please also fix the `tmp_thread` parameter for x86 while you are at it ;) > > It looks to me that the case for x86 is different here. For x86_32, this formal parameter will be used later for calling 'get_thread()' [1][2]. So I think we should keep its original name. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L573 > > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/shenandoah/shenandoahBarrierSetAssembler_x86.cpp#L573 @tschatzl : Are you OK with this? And any further comments? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10783 From aph at openjdk.org Wed Oct 26 09:41:23 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 26 Oct 2022 09:41:23 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <6kmxPAm1gTiXlFfR2fL4WUh2bdFMCRBbGy1btd1-3Dc=.04a52b3e-b931-4448-8529-8df7d7529436@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic On 10/26/22 03:24, John Rose wrote: > Now I'm curious: What's this magic code? Does it multiply a couple of well-chosen constants and test for zero? static const double unity = 0x1.0p-1020; static const volatile double thresh = 0x0.0000000000003p-1022; if (unity + (thresh) == unity || -unity - (thresh) == -unity) { ... fix the badness Here, thresh is the smallest denormal number that has two bits set. Unity is a number such that, when thresh is added it it, must be rounded according to the mode. These two tests detect the rounding mode in use. If denormals are turned off (i.e. denormals-are-zero) it looks like round- to-zero mode is in use. This is essentially the same test that Joe Darcy posted earlier in this thread, but scaled to make thresh so small that it is denormal. > I said "I guess" because I'm not clear on (a) the benefit of adding that nanosecond (other than preserving denorms against a rare system fault), nor on (b) what are the remaining faults not checked for, but which a more expensive MSR spill/restore would fix. If someone turned on flush-denormal-results-to-zero but didn't also turn on denormals-are-zero I guess we'd miss that. Otherwise it doesn't matter. I'd like the JVM to be protected against the GCC -ffast-math bug because the resulting JVM fault silently produces incorrect results. We can do that with a test after each dlopen(). But that doesn't work in a robust way: if we call some native code that itself dlopen()s a library compiled with -ffast-math we still lose. Hmm, I guess I could scale the operands a bit more to detect that too... What fun! ------------- PR: https://git.openjdk.org/jdk/pull/10661 From duke at openjdk.org Wed Oct 26 15:30:23 2022 From: duke at openjdk.org (vpaprotsk) Date: Wed, 26 Oct 2022 15:30:23 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Tue, 25 Oct 2022 21:57:34 GMT, Jamil Nimeh wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 296: > >> 294: keyBytes[12] &= (byte)252; >> 295: >> 296: // This should be enabled, but Poly1305KAT would fail > > I'm on the fence about this change. I have no problem with it in basic terms. If we ever decided to make this a general purpose Mac in JCE then this would definitely be good to do. As of right now, the only consumer is ChaCha20 and it would submit a key through the process in the RFC. Seems really unlikely to run afoul of these checks, but admittedly not impossible. > > I would agree with @sviswa7 that we could examine this in a separate change and we could look at other approaches to getting around the KAT issue, perhaps some package-private based way to disable the check. As long as Poly1305 remains with package-private visibility, one could make another form of the constructor with a boolean that would disable this check and that is the constructor that the KAT would use. This is just an off-the-cuff idea, but one way we might get the best of both worlds. > > If we move this down the road then we should remove the commenting. We can refer back to this PR later. I think I will remove the check for now, dont want to hold up reviews. I wasn't sure how to 'inject a backdoor' to the commented out check either, or at least how to do it in an acceptable way. Your ideas do sound plausible, and if anyone does want this check, I can implement one of the ideas (package private boolean flag? turn it on in the test) while waiting for more reviews to come in. The comment about ChaCha being the only way in is also relevant, thanks. i.e. this is a private class today. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Oct 26 15:51:22 2022 From: duke at openjdk.org (vpaprotsk) Date: Wed, 26 Oct 2022 15:51:22 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: <4FY4SEodgFcdxFXvGWFJWHYCr1GD4nAktLa5SiyPcxM=.384b2818-b6c5-4523-8682-5b730d9ad036@github.com> On Tue, 25 Oct 2022 23:48:49 GMT, Sandhya Viswanathan wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 806: > >> 804: evmovdquq(A0, Address(rsp, 64*0), Assembler::AVX_512bit); >> 805: evmovdquq(A0, Address(rsp, 64*1), Assembler::AVX_512bit); >> 806: evmovdquq(A0, Address(rsp, 64*2), Assembler::AVX_512bit); > > This is load from stack into A0. Did you intend to store A0 (cleanup) into stack local area here? I think the source and destination are mixed up here. Wow! Thank you for spotting this ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Wed Oct 26 15:51:23 2022 From: duke at openjdk.org (vpaprotsk) Date: Wed, 26 Oct 2022 15:51:23 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Tue, 25 Oct 2022 21:48:47 GMT, Jamil Nimeh wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 171: > >> 169: } >> 170: >> 171: if (len >= 1024) { > > Out of curiosity, do you have any perf numbers for the impact of this change on systems that do not support AVX512? Does this help or hurt (or make a negligible impact) on poly1305 updates when the input is 1K or larger? (The first commit in this PR actually has the code without the check if anyone wants to measure.. well its also trivial to edit..) I measured about 50% slowdown on 64 byte payloads. One could argue that 64 bytes is not all that representative, but we don't get much out of assembler at that load either so it didn't seem worth it to figure out some sort of platform check. AVX512 needs at least 256 = 16 blocks.. there is overhead also pre-calculating powers of R that needs to be amortized. Assembler does fall back to 64-bit multiplies for <256, while the Java version will have to use the 32-bit multiplies. <256, purely scalar, non-vector, 64 vs 32 is not _that_ big an issue though; the algorithm is plenty happy with 26-bit limbs, and whatever the benefit of 64, it gets erased by the interface-matching code copying limbs in and out.. Right now, I measured 1k with `-XX:-UsePolyIntrinsics` to be about 10% slower. I think its acceptable, in order to get 18x? Most/all of the slowdown comes from this need of copying limbs out/in.. I am looking at perhaps copying limbs out in the intrinsic instead. Not very 'pretty'.. limbs are hidden in a nested private class behind an interface.. I would be breaking what is a good design with neat encapsulation. (I accidentally forced-pushed that earlier, if you are curious; non-working). The current version of this code seems more robust in the long term? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From iwalulya at openjdk.org Wed Oct 26 16:27:31 2022 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 26 Oct 2022 16:27:31 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v5] In-Reply-To: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: > Hi, > > Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. > > Usecase is in parallelizing the merging of large remsets for G1. > > Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). > > Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. > This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). > > This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. > > Testing: tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Robbin suggestion to use BucketsOperation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10759/files - new: https://git.openjdk.org/jdk/pull/10759/files/1d736e77..f84e1c1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10759&range=03-04 Stats: 291 lines in 4 files changed: 124 ins; 151 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10759/head:pull/10759 PR: https://git.openjdk.org/jdk/pull/10759 From rkennke at openjdk.org Wed Oct 26 16:41:20 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 26 Oct 2022 16:41:20 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* Message-ID: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. Testing: - [x] GHA (x86 and x-compile failures look like infra glitch) - [x] tier1 - [x] tier2 - [x] tier3 - [x] tier4 ------------- Commit messages: - Improve condition in OM::has_owner() - Fix OM::has_owner() - 8295849: Consolidate Threads::owning_thread* Changes: https://git.openjdk.org/jdk/pull/10849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295849 Stats: 68 lines in 7 files changed: 18 ins; 36 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/10849.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10849/head:pull/10849 PR: https://git.openjdk.org/jdk/pull/10849 From duke at openjdk.org Wed Oct 26 17:29:27 2022 From: duke at openjdk.org (zzambers) Date: Wed, 26 Oct 2022 17:29:27 GMT Subject: RFR: 8295952: Problemlist existing compiler/rtm tests also on x86 Message-ID: Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. ------------- Commit messages: - Problemlist rtm issues also on i586 Changes: https://git.openjdk.org/jdk/pull/10875/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10875&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295952 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10875.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10875/head:pull/10875 PR: https://git.openjdk.org/jdk/pull/10875 From duke at openjdk.org Wed Oct 26 17:32:27 2022 From: duke at openjdk.org (zzambers) Date: Wed, 26 Oct 2022 17:32:27 GMT Subject: RFR: 8295952: Problemlist existing compiler/rtm tests also on x86 In-Reply-To: References: Message-ID: On Wed, 26 Oct 2022 16:43:26 GMT, zzambers wrote: > Problemlist should be extended so that existing compiler/rtm entries include x86 (32-bit) intel builds as well, as these are also affected. failures in GHA do not seem to be connected to this change ------------- PR: https://git.openjdk.org/jdk/pull/10875 From vlivanov at openjdk.org Wed Oct 26 17:55:24 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Oct 2022 17:55:24 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <18a7522c-c444-f908-d06d-681a9816cec5@oracle.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <18a7522c-c444-f908-d06d-681a9816cec5@oracle.com> Message-ID: On Wed, 26 Oct 2022 17:10:00 GMT, David Holmes wrote: > At most an expanded -Xcheck:jni check for FP-state manipulation, with an enhanced/fixed RestoreMXCSROnJNICalls is in order IMVHO. FTR both `-Xcheck:jni` and `-XX:+RestoreMXCSROnJNICalls` already catch the problematic case being discussed. $ .../jdk/bin/java ... compiler/floatingpoint/TestDenormalDouble Exception in thread "main" java.lang.AssertionError: TEST FAILED: 0.0 at compiler.floatingpoint.TestDenormalDouble.testDoubles(TestDenormalDouble.java:42) at compiler.floatingpoint.TestDenormalDouble.main(TestDenormalDouble.java:52) $ .../jdk/bin/java ... -Xcheck:jni compiler/floatingpoint/TestDenormalDouble Loading libfast-math.so Java HotSpot(TM) 64-Bit Server VM warning: MXCSR changed by native JNI code, use -XX:+RestoreMXCSROnJNICall Test passed. $ .../jdk/bin/java ... -XX:+RestoreMXCSROnJNICalls compiler/floatingpoint/TestDenormalDouble Loading libfast-math.so Test passed. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From vlivanov at openjdk.org Wed Oct 26 18:07:26 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 26 Oct 2022 18:07:26 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic There's another option to consider: check MXCSR register consistency at safepoints. It's already too late to fix the corruption (the damage could be already done and the only way to proceed is to initiate a JVM crash), but it should pretty reliably catch the corruption in a prompt manner (so far, by default JVM checks and adjusts MXCSR on x86-64 only during upcalls into Java). Also, looks like it should mix well with the fast approximate check on JNI calls (which should be able to catch & heal some of the possible corruptions). ------------- PR: https://git.openjdk.org/jdk/pull/10661 From matsaave at openjdk.org Wed Oct 26 19:22:34 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 26 Oct 2022 19:22:34 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v2] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: <1bfoEKP2K6f7eJ-FyU5uNN7L2TWWxUVX8U_x0FJitDY=.d7188708-562e-48f6-a18a-8d244d4dd42c@github.com> > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Added null check and resource mark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/0a4c53b7..f007d46b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From matsaave at openjdk.org Wed Oct 26 19:24:06 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 26 Oct 2022 19:24:06 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Tue, 25 Oct 2022 19:37:12 GMT, Matias Saavedra Silva wrote: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return I added a null check for method, a ResourseMark, and I replaced the else if with an else and an assert. Thank you for the corrections! ------------- PR: https://git.openjdk.org/jdk/pull/10860 From dcubed at openjdk.org Wed Oct 26 19:29:27 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 26 Oct 2022 19:29:27 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* In-Reply-To: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Tue, 25 Oct 2022 11:39:37 GMT, Roman Kennke wrote: > There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: > - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. > - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. > > Testing: > - [x] GHA (x86 and x-compile failures look like infra glitch) > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 I have a concern about the new `has_owner()` function and whether it might cause problems with dead lock detection when an ObjectMonitor is being deflated. src/hotspot/share/runtime/objectMonitor.inline.hpp line 62: > 60: void* owner = owner_raw(); > 61: return owner != NULL || owner == DEFLATER_MARKER; > 62: } Why does has_owner() return `true` when `owner == DEFLATER_MARKER`? I'm only seeing one caller to the new `has_owner()` function in `ThreadService::find_deadlocks_at_safepoint()` and I don't understand why that code needs to think `has_owner()` needs to be `true` if the target ObjectMonitor is being deflated. That new `has_owner()` call will result in calling `Threads::owning_thread_from_monitor()` with `waitingToLockMonitor` which is being deflated. So the return from `Threads::owning_thread_from_monitor()` will be `NULL` which will result in us taking the `num_deadlocks++` code path. If I'm reading this right, then we'll report a deflating monitor as being in a deadlock. What am I missing here? ------------- Changes requested by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/10849 From tsteele at openjdk.org Wed Oct 26 20:00:30 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:30 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers Message-ID: This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. ------------- Commit messages: - Fixup - Remove tmp register from s390 nmethod_entry_barrier. Use R0 & R1 implicitly - Invert condition - Add missing include to s390.ad - Change s390.ad register - Move s390.ad code to proper position - Remove uneeded #include - Incorporate MD's suggestions - Minor clean up - Change arg computation for vm-call in line with Martin's feedback. Remove redundant instr. - ... and 26 more: https://git.openjdk.org/jdk/compare/085949a1...0a683eee Changes: https://git.openjdk.org/jdk/pull/10558/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294729 Stats: 207 lines in 10 files changed: 198 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:36 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:36 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. I've taken a quick look. Please find my change requests. Please find my change requests. Please fix the offset computation for Z_ARG1 and try your tests again. LGTM. I think you can mark it ready for review if the tests are passing. Found another missing piece. Looks complete, now. I just spotted the bug (see below). About the register usage: Passing R0 as tmp while using R1 implicitly is inconsistent. I'd prefer e.g. using both regs implicitly and removing the tmp argument. You were also wondering about usage of an nv reg on PPC64: Don't confuse C calling convention with Java calling convention! Java callers don't expect them to be preserved. One more missing part: nmethod_entry_barrier + `C->output()->set_frame_complete(cbuf.insts_size());` at the end of C2 MachPrologNode::emit (s390.ad). Please check the build log in the Pre-submit tests. Seems like an `include` is missing. Can probably get reproduced by building without precompiled headers (configure flag `--disable-precompiled-headers`). 1. Why don?t you use R0 like at all other places? You could even remove the argument and use R0 inside of `nmethod_entry_barrier`. 2. There?s a `source %{` section. PPC64 uses it to `#include "oops/klass.inline.hpp?`. 3. Looks like R2 got overwritten at some point. You should see more information in the hs_err file and figure out where it was called (C1 method, C2 method, native wrapper). For that, you could try switching off C1 by -XX:-TieredCompilation or C2 by -XX:TieredStopAtLevel=1. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 95: > 93: > 94: // Conditional Jump > 95: __ z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); // 2 bytes This is only a jump. We would need a call which sets Z_R14 = return address. It should be possible to set Z_R14 manually to the return address before the jump (z_lghrl). Alternatively, you could implement a stub like on x86_64. src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 49: > 47: > 48: public: > 49: static const int BARRIER_TOTAL_LENGTH = GUARD_INSTRUCTION_OFFSET + 2*6 + 2; // bytes Please either use 14 or something which matches the sequence: 4 (patchable constant) + 6 (larl) + 4 (bcr) src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 60: > 58: int32_t* data_addr = (int32_t*)get_patchable_data_address(); > 59: > 60: Extra empty line. src/hotspot/cpu/s390/s390.ad line 851: > 849: Compile* C = ra_->C; > 850: C2_MacroAssembler _masm(&cbuf); > 851: Register nmethod_tmp = Z_R3; I guess we can't kill Z_R3, here. src/hotspot/cpu/s390/s390.ad line 851: > 849: Compile* C = ra_->C; > 850: C2_MacroAssembler _masm(&cbuf); > 851: // Register nmethod_tmp = Z_R3; Don't kill R3! I'd use R0 and R1 only. src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1546: > 1544: > 1545: BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); > 1546: bs->nmethod_entry_barrier(masm, Z_R3); Don't kill R3! I'd use R0 and R1 only. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869: > 2867: // Save caller's sp & return_pc > 2868: __ push_frame(frame::z_abi_16_size); > 2869: __ z_stmg(Z_R14, Z_R15, _z_abi16(callers_sp), Z_SP); Please use `save_return_pc()`. Z_R15 = Z_SP is already saved by push_frame. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869: > 2867: // Save caller's sp & return_pc > 2868: __ push_frame(frame::z_abi_16_size); > 2869: __ save_return_pc(); Wrong order: return pc needs to get stored in the caller's frame header (before push_frame). See PPC64 or other usages on s390. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2875: > 2873: // We construct a pointer to the location of R14 stored above. > 2874: __ z_xgr(Z_R2, Z_R2); > 2875: __ z_ag(Z_R2, _z_abi(return_pc), 0, Z_SP); Please use a better fitting instruction like z_la. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2876: > 2874: __ z_lay(Z_R1_scratch, -32, Z_R0, Z_R14); // R1 <- R14 - 32 > 2875: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1 > 2876: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2 Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame. (Don't generate a copy!) That matches _z_abi16(return_pc) + current frame size + Z_SP in the current frame at this point. In addition, I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs. I think they should get saved directly before you use Z_ARG1 for the return pc address and restored after the call_VM_leaf + z_ltr(Z_RET, Z_RET) which needs to get moved before the restoration. Note that this will need extra stack space: (5 + 8) * BytesPerWord (See `MacroAssembler::verify_oop` for reference, but note that you don't need to include_flags which reduces complexity.) src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2880: > 2878: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1 > 2879: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2 > 2880: // __ z_la(Z_ARG1, _z_abi(return_pc), Z_R0, Z_SP); Offset needs to be computed relative to the callee's SP, here: _z_abi(return_pc) + frame::z_abi_160_size + nbytes_volatile src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2883: > 2881: > 2882: // Restore caller's sp & return_pc > 2883: __ z_lmg(Z_R14, Z_R15, _z_abi(callers_sp), Z_SP); Like above. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2883: > 2881: // Restore caller's sp & return_pc > 2882: __ restore_return_pc(); > 2883: __ pop_frame(); Like above. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2890: > 2888: // if (return val != 0) > 2889: // return to caller > 2890: __ z_bcr(Assembler::bcondNotZero, Z_R14); The condition is inverted! Should be `bcondZero`. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2894: > 2892: // if (return val != 0) > 2893: // return to caller > 2894: __ z_ltr(Z_R0_scratch, Z_R0_scratch); This is redundant. Flags were already set above and not killed by restore_volatile_regs + pop_frame + restore_return_pc. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2895: > 2893: // Get handle to wrong-method-stub for s390 > 2894: __ load_const_optimized(Z_R1_scratch, SharedRuntime::get_handle_wrong_method_stub()); > 2895: __ z_br(Z_R1_scratch); Missing: Pop the frame built in the prologue and load the respective return_pc before the jump (see PPC64). ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558Marked as reviewed by mdoerr (Reviewer). From lucy at openjdk.org Wed Oct 26 20:00:37 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Changes requested by lucy (Reviewer). Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 79: > 77: > 78: BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); > 79: bs->nmethod_entry_barrier(this, Z_R3); Ha! Another use of Z_R3? Is it safe here? Probably not. In s390 code, you can only use Z_R0_scratch and Z_R1_scratch safely as scratch registers. Because everybody codes with that knowledge, you can't rely on these scratch registers to be preserved across generator calls - except when you have the entire call chain under your control. When you pass one of the scratch registers as tmp register to a generator, you have to know how the tmp is used. When it's used in address calculation or to hold the address of something, you can't pass Z_R0_scratch. You for sure know that. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 27: > 25: > 26: #include "precompiled.hpp" > 27: #include "asm/macroAssembler.hpp" This `#include` is unnecessary. It comes in via `#include "asm/macroAssembler.inline.hpp"` These are the rules for dependent includes: - if you need an ``-specific `.inline.hpp`, include the shared code counterpart instead. This will in turn include all the prereqs, especially `.hpp`, and then include the ``-specific `inline.hpp` via `CPU_HEADER_INLINE()`. - if you need an ``-specific `.hpp`, include the shared code counterpart instead. This will in turn include all the prereqs and then include the ``-specific `.hpp` via `CPU_HEADER()`. - Note that these rules are unfortunately not followed everywhere, leading to a lot of confusion and the insertion of arbitrary `#include` statements "until the thing builds". src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 93: > 91: // Compare to current patched value: > 92: __ z_cfi(tmp, /* to be patched */ -1); // 6 bytes (2 + 4 byte imm val) > 93: What about the value at `thread_disarmed_offset()`? Is it 4 bytes or 8 bytes? If it is 4 bytes, you should load it with `z_l(tmp, ...);` or `z_ly(tmp, ...);` if the offset might be negative. Similar size considerations on the compare: `z_cfi(tmp, ...);` is good for 4-byte value. `z_cgfi(tmp, ...);` is good for 8-byte values. Caution: signed compare. Sign-extension of the immediate may produce unexpected behaviour. src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 64: > 62: } > 63: > 64: void verify() const { Shouldn't the complete function body be encapsulated with #ifdef ASSERT If ASSERT is undefined, the code has no externally visible effect and thus is irrelevant. src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 81: > 79: assert(Assembler::is_equal(start[offset], BCR_ZOPC, RIL_MASK), "check BCR"); > 80: offset += Assembler::instr_len((unsigned long)BCR_ZOPC); > 81: I don't like specifying the same information (here: instruction opcode) multiple times. I would suggest you increment the offset with code like `offset += Assembler::instr_len(&start[offset]);` src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2891: > 2889: __ z_cfi(Z_R2, 0); > 2890: __ z_bcr(Assembler::bcondNotEqual, Z_R14); > 2891: `z_ltr(Z_R2, Z_R2);` is the preferred way of testing register contents. I would then use the "speaking" alias `z_brnz(Z_R14); Note: there are subtle semantic differences between "not equal" and "not zero". See assembler_s390.hpp. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2899: > 2897: // Call wrong-method-stub > 2898: __ z_br(Z_R2); > 2899: This is dead code. Or should the branch above be a call instead? ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:37 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 10:35:08 GMT, Martin Doerr wrote: > Please check the build log in the Pre-submit tests. It looks like there are 2 issues in the pre-submit tests: 1. I copied a named register variable use without copying it's declaration. I see PPC used R22, which seems odd to me. Should I use a non-volatile register on s390 as well? 2. An import issue that complains about BarrierSetAssembler, but there are no c-style imports in s390.ad. How do I control the header files imported in .ad files? I am also encountering an issue on my builds: # Internal Error (/home/ty/openjdk/jdk-current/src/hotspot/share/runtime/frame.cpp:1082), pid=1276093, tid=1276098 # assert(Universe::heap()->is_in_or_null(r)) failed: bad receiver: 0x000003ffb48ebb60 (4396780796768) Which happens after the jump to wrong_method_stub. I've seen errors here on and off throughout development, and I was really hoping that adding the missing pop_frame and restore_return_pc before the jump there was the missing piece. > About the register usage I think this is a good point. I've changed s390's nmethod_entry_barrier to take no tmp reg, and use R0 and R1 implicitly. > Don't confuse C calling convention with Java calling convention! I sometimes do make the error of thinking that registers have platform imposed restrictions on their behaviour. Thanks for pointing this out. > src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 49: > >> 47: >> 48: public: >> 49: static const int BARRIER_TOTAL_LENGTH = GUARD_INSTRUCTION_OFFSET + 2*6 + 2; // bytes > > Please either use 14 or something which matches the sequence: 4 (patchable constant) + 6 (larl) + 4 (bcr) GUARD_INSTRUCTION_OFFSET is the offset to the beginning of the patchable instruction. So, I believe it should be: 6 (cfi) + 6 (larl) + 2 (bcr). It may be worth renaming 'GUARD_INSTRUCTION_OFFSET' as I feel it's a bit confusing. > src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 60: > >> 58: int32_t* data_addr = (int32_t*)get_patchable_data_address(); >> 59: >> 60: > > Extra empty line. Removed. Thanks. > src/hotspot/cpu/s390/s390.ad line 851: > >> 849: Compile* C = ra_->C; >> 850: C2_MacroAssembler _masm(&cbuf); >> 851: Register nmethod_tmp = Z_R3; > > I guess we can't kill Z_R3, here. I have removed the reference to Z_R3. > src/hotspot/cpu/s390/s390.ad line 851: > >> 849: Compile* C = ra_->C; >> 850: C2_MacroAssembler _masm(&cbuf); >> 851: // Register nmethod_tmp = Z_R3; > > Don't kill R3! I'd use R0 and R1 only. I would guess this is because, even though it's a volatile register, R3 can be a parameter or return value. So in the context of the compiler, we really don't want to use this register? BTW, I have now removed my changes to this file. > src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 1546: > >> 1544: >> 1545: BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); >> 1546: bs->nmethod_entry_barrier(masm, Z_R3); > > Don't kill R3! I'd use R0 and R1 only. Thanks! > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869: > >> 2867: // Save caller's sp & return_pc >> 2868: __ push_frame(frame::z_abi_16_size); >> 2869: __ save_return_pc(); > > Wrong order: return pc needs to get stored in the caller's frame header (before push_frame). See PPC64 or other usages on s390. Thanks. > Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame. This matches what the PPC implementation does, but when I do the same thing on s390 I get a cache miss in nmethod_stub_entry_barrier (the vm-call). It looked as though CodeCache::find_blob expects the address of the start of the compiled code, so I tried subtracting the size of the barrier from R14 (which currently points to end of the barrier in the compiled frame). After doing this I no longer saw the CodeCache miss. > I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs. I had been trying to get the volatile registers saved, but didn't have any luck. I tried it today with your suggestions and it worked like a charm. Not sure what the difference was. Thanks for the pointers. > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2880: > >> 2878: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1 >> 2879: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2 >> 2880: // __ z_la(Z_ARG1, _z_abi(return_pc), Z_R0, Z_SP); > > Offset needs to be computed relative to the callee's SP, here: _z_abi(return_pc) + frame::z_abi_160_size + nbytes_volatile Got it. Thanks. > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2890: > >> 2888: // if (return val != 0) >> 2889: // return to caller >> 2890: __ z_bcr(Assembler::bcondNotZero, Z_R14); > > The condition is inverted! Should be `bcondZero`. That appears to have solved it. Many thanks ? > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2894: > >> 2892: // if (return val != 0) >> 2893: // return to caller >> 2894: __ z_ltr(Z_R0_scratch, Z_R0_scratch); > > This is redundant. Flags were already set above and not killed by restore_volatile_regs + pop_frame + restore_return_pc. Agreed. Thanks for catching that. > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2895: > >> 2893: // Get handle to wrong-method-stub for s390 >> 2894: __ load_const_optimized(Z_R1_scratch, SharedRuntime::get_handle_wrong_method_stub()); >> 2895: __ z_br(Z_R1_scratch); > > Missing: Pop the frame built in the prologue and load the respective return_pc before the jump (see PPC64). I saw that the PPC impl does that, and had been experimenting with it. But it always felt weird to call pop_frame twice here. Thanks for confirming that it is necessary. I believe I have a better understanding of why this is needed. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:37 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Thanks for your answers :slightly_smiling_face: 1. I wanted to make sure there weren't conflicts with using R0 in s390.ad. Switching to R0 seems to have resolved the issue. 2. Thanks for clarifying. Yes, it seems they are c-style imports, but wrapped in `source %{ ... }` sections, not at the beginning of the file where I was expecting them. This issue is fixed as well. 3. I see why you may be thinking that R2 is the problem, but I don't see where the issue is. AFIK, R2 should be restored to its original value before the jump to wrong_method_stub. - The hs_err file is certainly useful. But, I wasn't able to pass the -XX args you mentioned because I encounter the issue during the optimized build phase. Is there a way to pass those args to the build jvm? Switching to a PR as the implementation is now complete. The tests in `hotspot/jtreg/gc` all pass, and I am running T1 currently. The pre-test failures appear to be unrelated to the change. Thanks again to @TheRealMDoerr and @RealLucy for their prodigious help ? ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:37 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: <64lENqcjoFCceNSLxNfmN6Kv3HNWundZqMHl-yvoSvk=.f02fb437-d92d-40ca-9da6-53c1e94df259@github.com> On Tue, 11 Oct 2022 16:17:29 GMT, Lutz Schmidt wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 79: > >> 77: >> 78: BarrierSetAssembler* bs = BarrierSet::barrier_set()->barrier_set_assembler(); >> 79: bs->nmethod_entry_barrier(this, Z_R3); > > Ha! Another use of Z_R3? Is it safe here? Probably not. > > In s390 code, you can only use Z_R0_scratch and Z_R1_scratch safely as scratch registers. Because everybody codes with that knowledge, you can't rely on these scratch registers to be preserved across generator calls - except when you have the entire call chain under your control. > > When you pass one of the scratch registers as tmp register to a generator, you have to know how the tmp is used. When it's used in address calculation or to hold the address of something, you can't pass Z_R0_scratch. You for sure know that. Good catch! I have changed this. > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 27: > >> 25: >> 26: #include "precompiled.hpp" >> 27: #include "asm/macroAssembler.hpp" > > This `#include` is unnecessary. It comes in via > `#include "asm/macroAssembler.inline.hpp"` > These are the rules for dependent includes: > - if you need an ``-specific `.inline.hpp`, include the shared code counterpart instead. This will in turn include all the prereqs, especially `.hpp`, and then include the ``-specific `inline.hpp` via `CPU_HEADER_INLINE()`. > - if you need an ``-specific `.hpp`, include the shared code counterpart instead. This will in turn include all the prereqs and then include the ``-specific `.hpp` via `CPU_HEADER()`. > - Note that these rules are unfortunately not followed everywhere, leading to a lot of confusion and the insertion of arbitrary `#include` statements "until the thing builds". Thanks for explaining this. I always felt like there was a structure here, but wasn't sure the rules. Knowing about the CPU_HEADER macros is also especially useful. > src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 64: > >> 62: } >> 63: >> 64: void verify() const { > > Shouldn't the complete function body be encapsulated with > #ifdef ASSERT > If ASSERT is undefined, the code has no externally visible effect and thus is irrelevant. This change is reasonable. Thanks for the suggestion. > src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 81: > >> 79: assert(Assembler::is_equal(start[offset], BCR_ZOPC, RIL_MASK), "check BCR"); >> 80: offset += Assembler::instr_len((unsigned long)BCR_ZOPC); >> 81: > > I don't like specifying the same information (here: instruction opcode) multiple times. I would suggest you increment the offset with code like > `offset += Assembler::instr_len(&start[offset]);` I like this. I have made this change. > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2891: > >> 2889: __ z_cfi(Z_R2, 0); >> 2890: __ z_bcr(Assembler::bcondNotEqual, Z_R14); >> 2891: > > `z_ltr(Z_R2, Z_R2);` > is the preferred way of testing register contents. I would then use the "speaking" alias > `z_brnz(Z_R14); > Note: there are subtle semantic differences between "not equal" and "not zero". See assembler_s390.hpp. I changed to the preferred test instruction. However, it seems like z_brnz won't work in this situation because the branch address is in a register and not a label. Side note it too me a second to realize that I am using bcr and brnz is an alias for brc. It seems odd that there isn't a "speaking" alias for bcr (or maybe I'm just not finding it). > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2899: > >> 2897: // Call wrong-method-stub >> 2898: __ z_br(Z_R2); >> 2899: > > This is dead code. > Or should the branch above be a call instead? You're right; this is dead code. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:38 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 10:42:20 GMT, Lutz Schmidt wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 93: > >> 91: // Compare to current patched value: >> 92: __ z_cfi(tmp, /* to be patched */ -1); // 6 bytes (2 + 4 byte imm val) >> 93: > > What about the value at `thread_disarmed_offset()`? Is it 4 bytes or 8 bytes? If it is 4 bytes, you should load it with > `z_l(tmp, ...);` or > `z_ly(tmp, ...);` if the offset might be negative. > Similar size considerations on the compare: > `z_cfi(tmp, ...);` is good for 4-byte value. > `z_cgfi(tmp, ...);` is good for 8-byte values. Caution: signed compare. Sign-extension of the immediate may produce unexpected behaviour. One more thing: What's the value range of thread_disarmed? Does it use less than 16 bits in all cases? Then you could exploit a storage-immediate variant of compare if the offset is positive: `z_clfhsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1);` 4-byte `z_clghsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1);` 8-byte 2-byte immediate in both cases. Caution: sign extension! No need for a tmp register anymore! ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:38 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 10:51:23 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 93: >> >>> 91: // Compare to current patched value: >>> 92: __ z_cfi(tmp, /* to be patched */ -1); // 6 bytes (2 + 4 byte imm val) >>> 93: >> >> What about the value at `thread_disarmed_offset()`? Is it 4 bytes or 8 bytes? If it is 4 bytes, you should load it with >> `z_l(tmp, ...);` or >> `z_ly(tmp, ...);` if the offset might be negative. >> Similar size considerations on the compare: >> `z_cfi(tmp, ...);` is good for 4-byte value. >> `z_cgfi(tmp, ...);` is good for 8-byte values. Caution: signed compare. Sign-extension of the immediate may produce unexpected behaviour. > > One more thing: > What's the value range of thread_disarmed? Does it use less than 16 bits in all cases? Then you could exploit a storage-immediate variant of compare if the offset is positive: > `z_clfhsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1);` 4-byte > `z_clghsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1);` 8-byte > 2-byte immediate in both cases. Caution: sign extension! > No need for a tmp register anymore! `_nmethod_disarm_value` is 64 bit of which the high order 32 bits are only used on aarch64. Other platforms use a 4 Byte access, so `z_cfi` is correct. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:38 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 11:14:52 GMT, Martin Doerr wrote: >> One more thing: >> What's the value range of thread_disarmed? Does it use less than 16 bits in all cases? Then you could exploit a storage-immediate variant of compare if the offset is positive: >> `z_clfhsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1);` 4-byte >> `z_clghsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1);` 8-byte >> 2-byte immediate in both cases. Caution: sign extension! >> No need for a tmp register anymore! > > `_nmethod_disarm_value` is 64 bit of which the high order 32 bits are only used on aarch64. Other platforms use a 4 Byte access, so `z_cfi` is correct. How is the value stored then? Using a 8-byte store? Hopefully... ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:38 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 12:41:47 GMT, Lutz Schmidt wrote: >> `_nmethod_disarm_value` is 64 bit of which the high order 32 bits are only used on aarch64. Other platforms use a 4 Byte access, so `z_cfi` is correct. > > How is the value stored then? Using a 8-byte store? Hopefully... Yes, `_nmethod_disarm_value = (uint64_t)(uint32_t)value;` ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:38 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Fri, 7 Oct 2022 09:22:26 GMT, Martin Doerr wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 95: > >> 93: >> 94: // Conditional Jump >> 95: __ z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); // 2 bytes > > This is only a jump. We would need a call which sets Z_R14 = return address. It should be possible to set Z_R14 manually to the return address before the jump (z_lghrl). Alternatively, you could implement a stub like on x86_64. I you really want to use a nerdy hack, I would suggest you use `z_larl(Z_R14, (instr_len((unsigned long)LARL_ZOPC) + instr_len((unsigned long)BCR_ZOPC)) / 2);` `z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); ` > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869: > >> 2867: // Save caller's sp & return_pc >> 2868: __ push_frame(frame::z_abi_16_size); >> 2869: __ z_stmg(Z_R14, Z_R15, _z_abi16(callers_sp), Z_SP); > > Please use `save_return_pc()`. Z_R15 = Z_SP is already saved by push_frame. You should not rely on Z_R14 being the return register in all circumstances and for all future. There are comments in the Principles of Operation and elsewhere that Z_R7 might assume a similar role. With save_return_pc() you are on the safe side and the code speaks for itself (no knowledge about Z_R14 required). > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2875: > >> 2873: // We construct a pointer to the location of R14 stored above. >> 2874: __ z_xgr(Z_R2, Z_R2); >> 2875: __ z_ag(Z_R2, _z_abi(return_pc), 0, Z_SP); > > Please use a better fitting instruction like z_la. The code as written here would load the contents of the stack location reserved for the return pc. Less complicated: would load the return pc. If you want to load the address of the storage location, you would use `z_la (Z_R2, _z_abi(return_pc), Z_R0, Z_SP);` Load Address in it's various forms just does the address calculation, it does not access storage. And please, use Z_R0 (not just 0) to specify an optional, unspecified register - here and at all other places. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:38 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> On Fri, 7 Oct 2022 12:19:08 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 95: >> >>> 93: >>> 94: // Conditional Jump >>> 95: __ z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); // 2 bytes >> >> This is only a jump. We would need a call which sets Z_R14 = return address. It should be possible to set Z_R14 manually to the return address before the jump (z_lghrl). Alternatively, you could implement a stub like on x86_64. > > I you really want to use a nerdy hack, I would suggest you use > `z_larl(Z_R14, (instr_len((unsigned long)LARL_ZOPC) + instr_len((unsigned long)BCR_ZOPC)) / 2);` > `z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); > ` Edit: adapted to the fact that entry_barrier is an 8-byte field with only the rightmost 4 bytes being significant. If this isn't nerdy enough, I have another idea (**works only on z14 and newer**). It replaces the load_const at the beginning with a z_larl() and then uses a z_bic() to fetch the target address and branch there. The z_larl will never go out of range because all generated code is in the code cache which is limited to 4GB in size. In total we would get: __ z_larl(Z_R1_scratch, &StubRoutines::zarch::nmethod_entry_barrier()); __ z_clghsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1); __ z_larl(Z_R14, (instr_len((unsigned long)LARL_ZOPC) + instr_len((unsigned long)BIC_ZOPC)) / 2); __ z_bic(Assembler::bcondNotEqual, 0, Z_R0, Z_R1_scratch); Nice, short, compact. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:38 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:38 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> Message-ID: On Fri, 7 Oct 2022 12:36:29 GMT, Lutz Schmidt wrote: >> I you really want to use a nerdy hack, I would suggest you use >> `z_larl(Z_R14, (instr_len((unsigned long)LARL_ZOPC) + instr_len((unsigned long)BCR_ZOPC)) / 2);` >> `z_bcr(Assembler::bcondNotEqual, Z_R1_scratch); >> ` > > Edit: adapted to the fact that entry_barrier is an 8-byte field with only the rightmost 4 bytes being significant. > > If this isn't nerdy enough, I have another idea (**works only on z14 and newer**). It replaces the load_const at the beginning with a z_larl() and then uses a z_bic() to fetch the target address and branch there. The z_larl will never go out of range because all generated code is in the code cache which is limited to 4GB in size. In total we would get: > > __ z_larl(Z_R1_scratch, &StubRoutines::zarch::nmethod_entry_barrier()); > __ z_clghsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1); > __ z_larl(Z_R14, (instr_len((unsigned long)LARL_ZOPC) + instr_len((unsigned long)BIC_ZOPC)) / 2); > __ z_bic(Assembler::bcondNotEqual, 0, Z_R0, Z_R1_scratch); > > Nice, short, compact. Well spotted. I hadn't appreciated that this instruction wasn't setting the return address. I am leaning towards the first ~nerdy~ cool hack if it's more general, though I do appreciate the compactness of the second suggestion. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:39 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> Message-ID: On Fri, 7 Oct 2022 14:11:42 GMT, Tyler Steele wrote: >> Edit: adapted to the fact that entry_barrier is an 8-byte field with only the rightmost 4 bytes being significant. >> >> If this isn't nerdy enough, I have another idea (**works only on z14 and newer**). It replaces the load_const at the beginning with a z_larl() and then uses a z_bic() to fetch the target address and branch there. The z_larl will never go out of range because all generated code is in the code cache which is limited to 4GB in size. In total we would get: >> >> __ z_larl(Z_R1_scratch, &StubRoutines::zarch::nmethod_entry_barrier()); >> __ z_clghsi(in_bytes(bs_nm->thread_disarmed_offset()), Z_thread, /* to be patched */ -1); >> __ z_larl(Z_R14, (instr_len((unsigned long)LARL_ZOPC) + instr_len((unsigned long)BIC_ZOPC)) / 2); >> __ z_bic(Assembler::bcondNotEqual, 0, Z_R0, Z_R1_scratch); >> >> Nice, short, compact. > > Well spotted. I hadn't appreciated that this instruction wasn't setting the return address. > > I am leaning towards the first ~nerdy~ cool hack if it's more general, though I do appreciate the compactness of the second suggestion. Doesn't `z_larl(Z_R1_scratch, &StubRoutines::zarch::nmethod_entry_barrier())` always work? It uses a relocation and can reach any address within the code cache. I think this makes sense regardless of the z14 choice. (Note that there are complicated offset computations and processor model dependent code won't make it easier :-) ) ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:39 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> Message-ID: <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> On Fri, 7 Oct 2022 14:25:37 GMT, Martin Doerr wrote: >> Well spotted. I hadn't appreciated that this instruction wasn't setting the return address. >> >> I am leaning towards the first ~nerdy~ cool hack if it's more general, though I do appreciate the compactness of the second suggestion. > > Doesn't `z_larl(Z_R1_scratch, &StubRoutines::zarch::nmethod_entry_barrier())` always work? It uses a relocation and can reach any address within the code cache. I think this makes sense regardless of the z14 choice. (Note that there are complicated offset computations and processor model dependent code won't make it easier :-) ) Yes, my "optimisations" probably will be of no use because a) the compare value has. to be 4 bytes and b) the bic instruction is available from z14 only. Anyway, you get a glimpse of the beauty of z programming. :-) @TheRealMDoerr Yes, z_larl() will always work within the bounds of CodeCache. But it only gives you the address where the address is stored. You need another z_lg() to get the branch address. And then there is no benefit anymore compared to two 4-byte immediate loads. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:39 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> Message-ID: On Fri, 7 Oct 2022 14:45:01 GMT, Lutz Schmidt wrote: >> Doesn't `z_larl(Z_R1_scratch, &StubRoutines::zarch::nmethod_entry_barrier())` always work? It uses a relocation and can reach any address within the code cache. I think this makes sense regardless of the z14 choice. (Note that there are complicated offset computations and processor model dependent code won't make it easier :-) ) > > @TheRealMDoerr Yes, z_larl() will always work within the bounds of CodeCache. But it only gives you the address where the address is stored. You need another z_lg() to get the branch address. And then there is no benefit anymore compared to two 4-byte immediate loads. The branch target is constant once the stub is created. We don't need to load it from memory. I meant `__ z_larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier());` ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:39 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> Message-ID: <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> On Fri, 7 Oct 2022 15:18:27 GMT, Martin Doerr wrote: >> @TheRealMDoerr Yes, z_larl() will always work within the bounds of CodeCache. But it only gives you the address where the address is stored. You need another z_lg() to get the branch address. And then there is no benefit anymore compared to two 4-byte immediate loads. > > The branch target is constant once the stub is created. We don't need to load it from memory. > I meant `__ z_larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier());` Good idea! One instruction instead of two, plus avoiding a data interlock when loading the constant address. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:39 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> Message-ID: On Mon, 10 Oct 2022 09:08:19 GMT, Lutz Schmidt wrote: >> The branch target is constant once the stub is created. We don't need to load it from memory. >> I meant `__ z_larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier());` > > Good idea! One instruction instead of two, plus avoiding a data interlock when loading the constant address. I made this change. Though I do want to confirm that `__ z_larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier());` is the correct version, and not `&StubRoutines::zarch::nmethod_entry_barrier()` which is also mentioned above. Since nmethod_entry_barrier returns an `address` (and address is already a `char*`) I don't believe we want the ampersand. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:39 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> Message-ID: On Tue, 11 Oct 2022 15:14:19 GMT, Tyler Steele wrote: >> Good idea! One instruction instead of two, plus avoiding a data interlock when loading the constant address. > > I made this change. Though I do want to confirm that `__ z_larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier());` is the correct version, and not `&StubRoutines::zarch::nmethod_entry_barrier()` which is also mentioned above. Since nmethod_entry_barrier returns an `address` (and address is already a `char*`) I don't believe we want the ampersand. You are right. We don't want the '&'. I first assumed (disregarding the parentheses) this would be a field where the address of a generated stub is stored. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:39 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> Message-ID: On Tue, 11 Oct 2022 15:49:11 GMT, Lutz Schmidt wrote: >> I made this change. Though I do want to confirm that `__ z_larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier());` is the correct version, and not `&StubRoutines::zarch::nmethod_entry_barrier()` which is also mentioned above. Since nmethod_entry_barrier returns an `address` (and address is already a `char*`) I don't believe we want the ampersand. > > You are right. We don't want the '&'. I first assumed (disregarding the parentheses) this would be a field where the address of a generated stub is stored. I think I had the same thought when I wrote the code initially, so I was doubting my reasoning. Thanks for confirming :-) ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:39 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> Message-ID: On Tue, 11 Oct 2022 15:56:27 GMT, Tyler Steele wrote: >> You are right. We don't want the '&'. I first assumed (disregarding the parentheses) this would be a field where the address of a generated stub is stored. > > I think I had the same thought when I wrote the code initially, so I was doubting my reasoning. Thanks for confirming :-) After having another look at the description of `larl`, I don't think I understand why it would be a desirable replacement for `load_const`. I replaced `load_const(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier()); ... bcr(..., Z_R1_scratch);` with `larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier()); ... bcr(..., Z_R1_scratch);`. But this resulted in a bad jump since there is a mismatch between the address returned by `nmethod_entry_barrier()`, and the number of halfwords to be added to the current PC expected by `larl`. To make this work, it seems that I would need to compute the relative location between the current location and the address I want to jump to. Can I do this statically? ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:40 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> Message-ID: On Tue, 11 Oct 2022 19:03:24 GMT, Tyler Steele wrote: >> I think I had the same thought when I wrote the code initially, so I was doubting my reasoning. Thanks for confirming :-) > > After having another look at the description of `larl`, I don't think I understand why it would be a desirable replacement for `load_const`. > > I replaced > `load_const(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier()); ... bcr(..., Z_R1_scratch);` > with > `larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier()); ... bcr(..., Z_R1_scratch);`. > But this resulted in a bad jump since there is a mismatch between the address returned by `nmethod_entry_barrier()`, and the number of halfwords to be added to the current PC expected by `larl`. > > To make this work, it seems that I would need to compute the relative location between the current location and the address I want to jump to. Can I do this statically? I agree: stick with the load_const as you suggested. You can't calculate the distance statically. The generated nmethod is copied around at least once when code generation is complete. You would need a relocation to handle that, and we don't have one. Sorry for all the discussion which led to just nothing. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:40 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <6jY7k4mfzbXsBNHVCfbgk5K5ezzN6K9GBrAdJu5B1Uc=.283e4c65-d1a9-4c49-ab20-ea4a948f8fa4@github.com> <9ypU9IbcaaQBdf8U1-SkfwOHsh1qen18KQxw1bMiLUU=.219e7cc6-c9f8-48dd-9e9b-4b52bff7fa4d@github.com> <-65ZRyQ_R5YW64xfOXkpHMi6ITcv8Xe9CnbHmVTDyJ8=.0bc1ec2d-b9cc-40c8-b05f-f9c3a05288d8@github.com> Message-ID: <75bHfFpmwBf6_WuG9OY5W4x6KIuiYhHaza2bd7GIAGc=.1b920eab-0064-4512-85cc-c96b7a4767d9@github.com> On Tue, 11 Oct 2022 20:12:50 GMT, Lutz Schmidt wrote: >> After having another look at the description of `larl`, I don't think I understand why it would be a desirable replacement for `load_const`. >> >> I replaced >> `load_const(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier()); ... bcr(..., Z_R1_scratch);` >> with >> `larl(Z_R1_scratch, StubRoutines::zarch::nmethod_entry_barrier()); ... bcr(..., Z_R1_scratch);`. >> But this resulted in a bad jump since there is a mismatch between the address returned by `nmethod_entry_barrier()`, and the number of halfwords to be added to the current PC expected by `larl`. >> >> To make this work, it seems that I would need to compute the relative location between the current location and the address I want to jump to. Can I do this statically? > > I agree: stick with the load_const as you suggested. > You can't calculate the distance statically. The generated nmethod is copied around at least once when code generation is complete. You would need a relocation to handle that, and we don't have one. > Sorry for all the discussion which led to just nothing. No need to apologize! I appreciate the suggestions either way. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:40 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 17:17:47 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetNMethod_s390.cpp line 49: >> >>> 47: >>> 48: public: >>> 49: static const int BARRIER_TOTAL_LENGTH = GUARD_INSTRUCTION_OFFSET + 2*6 + 2; // bytes >> >> Please either use 14 or something which matches the sequence: 4 (patchable constant) + 6 (larl) + 4 (bcr) > > GUARD_INSTRUCTION_OFFSET is the offset to the beginning of the patchable instruction. So, I believe it should be: 6 (cfi) + 6 (larl) + 2 (bcr). > > It may be worth renaming 'GUARD_INSTRUCTION_OFFSET' as I feel it's a bit confusing. Ok, I thought bcr was 4 bytes. That may be wrong. >> src/hotspot/cpu/s390/s390.ad line 851: >> >>> 849: Compile* C = ra_->C; >>> 850: C2_MacroAssembler _masm(&cbuf); >>> 851: // Register nmethod_tmp = Z_R3; >> >> Don't kill R3! I'd use R0 and R1 only. > > I would guess this is because, even though it's a volatile register, R3 can be a parameter or return value. So in the context of the compiler, we really don't want to use this register? > > BTW, I have now removed my changes to this file. R3 can contain a parameter at this point. You don't want to overwrite it. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:40 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: <4sGpr9FKLgFFACLM_PbRZ6VemIqE98Fa7K1qsZ19BI8=.72413c89-e703-4fdc-8ac4-41bb8e078e65@github.com> On Tue, 11 Oct 2022 15:40:59 GMT, Martin Doerr wrote: >> I would guess this is because, even though it's a volatile register, R3 can be a parameter or return value. So in the context of the compiler, we really don't want to use this register? >> >> BTW, I have now removed my changes to this file. > > R3 can contain a parameter at this point. You don't want to overwrite it. Makes sense. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:40 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: <8dNcYSrSX1Yrjv70q3AYc8pALsUpCDdDcv7tZwaOXko=.b52b61c1-5062-4a5b-abad-b22447898b00@github.com> On Fri, 7 Oct 2022 10:11:46 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2869: >> >>> 2867: // Save caller's sp & return_pc >>> 2868: __ push_frame(frame::z_abi_16_size); >>> 2869: __ z_stmg(Z_R14, Z_R15, _z_abi16(callers_sp), Z_SP); >> >> Please use `save_return_pc()`. Z_R15 = Z_SP is already saved by push_frame. > > You should not rely on Z_R14 being the return register in all circumstances and for all future. There are comments in the Principles of Operation and elsewhere that Z_R7 might assume a similar role. With save_return_pc() you are on the safe side and the code speaks for itself (no knowledge about Z_R14 required). Makes sense. I've made this change. >> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2875: >> >>> 2873: // We construct a pointer to the location of R14 stored above. >>> 2874: __ z_xgr(Z_R2, Z_R2); >>> 2875: __ z_ag(Z_R2, _z_abi(return_pc), 0, Z_SP); >> >> Please use a better fitting instruction like z_la. > > The code as written here would load the contents of the stack location reserved for the return pc. Less complicated: would load the return pc. If you want to load the address of the storage location, you would use > `z_la (Z_R2, _z_abi(return_pc), Z_R0, Z_SP);` > Load Address in it's various forms just does the address calculation, it does not access storage. > > And please, use Z_R0 (not just 0) to specify an optional, unspecified register - here and at all other places. Thanks for the feedback. I agree this is better, and have made this change as well. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:41 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 18:10:49 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2876: >> >>> 2874: __ z_lay(Z_R1_scratch, -32, Z_R0, Z_R14); // R1 <- R14 - 32 >>> 2875: __ z_stg(Z_R1_scratch, _z_abi(carg_2), Z_R0, Z_SP); // SP[abi_carg2] <- R1 >>> 2876: __ z_la(Z_ARG1, _z_abi(carg_2), Z_R0, Z_SP); // R2 <- SP + abi_carg2 >> >> Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame. (Don't generate a copy!) That matches _z_abi16(return_pc) + current frame size + Z_SP in the current frame at this point. >> In addition, I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs. I think they should get saved directly before you use Z_ARG1 for the return pc address and restored after the call_VM_leaf + z_ltr(Z_RET, Z_RET) which needs to get moved before the restoration. Note that this will need extra stack space: (5 + 8) * BytesPerWord >> (See `MacroAssembler::verify_oop` for reference, but note that you don't need to include_flags which reduces complexity.) > >> Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame. > > This matches what the PPC implementation does, but when I do the same thing on s390 I get a cache miss in nmethod_stub_entry_barrier (the vm-call). It looked as though CodeCache::find_blob expects the address of the start of the compiled code, so I tried subtracting the size of the barrier from R14 (which currently points to end of the barrier in the compiled frame). After doing this I no longer saw the CodeCache miss. > >> I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs. > > I had been trying to get the volatile registers saved, but didn't have any luck. I tried it today with your suggestions and it worked like a charm. Not sure what the difference was. Thanks for the pointers. After a bit more investigation, I believe I see the reasoning behind using the suggested computation for the VM-Call's argument. It seems the CodeCache miss is the issue, not the argument, so I am focusing my efforts on understanding what is happening there. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Wed Oct 26 20:00:41 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Wed, 19 Oct 2022 21:41:04 GMT, Tyler Steele wrote: >>> Z_ARG1 should point to the address _z_abi16(return_pc) + Z_SP in the caller frame. >> >> This matches what the PPC implementation does, but when I do the same thing on s390 I get a cache miss in nmethod_stub_entry_barrier (the vm-call). It looked as though CodeCache::find_blob expects the address of the start of the compiled code, so I tried subtracting the size of the barrier from R14 (which currently points to end of the barrier in the compiled frame). After doing this I no longer saw the CodeCache miss. >> >>> I'm missing save_volatile_gprs & restore_volatile_gprs for GP and FP regs. >> >> I had been trying to get the volatile registers saved, but didn't have any luck. I tried it today with your suggestions and it worked like a charm. Not sure what the difference was. Thanks for the pointers. > > After a bit more investigation, I believe I see the reasoning behind using the suggested computation for the VM-Call's argument. It seems the CodeCache miss is the issue, not the argument, so I am focusing my efforts on understanding what is happening there. You just used the return_pc address of the wrong frame. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:41 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Thu, 20 Oct 2022 12:22:27 GMT, Martin Doerr wrote: >> After a bit more investigation, I believe I see the reasoning behind using the suggested computation for the VM-Call's argument. It seems the CodeCache miss is the issue, not the argument, so I am focusing my efforts on understanding what is happening there. > > You just used the return_pc address of the wrong frame. Right. I see the difference mentioned in your other comment. I have made this change. Thanks for clarifying. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:41 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: <64lENqcjoFCceNSLxNfmN6Kv3HNWundZqMHl-yvoSvk=.f02fb437-d92d-40ca-9da6-53c1e94df259@github.com> References: <64lENqcjoFCceNSLxNfmN6Kv3HNWundZqMHl-yvoSvk=.f02fb437-d92d-40ca-9da6-53c1e94df259@github.com> Message-ID: On Fri, 7 Oct 2022 17:02:25 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2891: >> >>> 2889: __ z_cfi(Z_R2, 0); >>> 2890: __ z_bcr(Assembler::bcondNotEqual, Z_R14); >>> 2891: >> >> `z_ltr(Z_R2, Z_R2);` >> is the preferred way of testing register contents. I would then use the "speaking" alias >> `z_brnz(Z_R14); >> Note: there are subtle semantic differences between "not equal" and "not zero". See assembler_s390.hpp. > > I changed to the preferred test instruction. However, it seems like z_brnz won't work in this situation because the branch address is in a register and not a label. > > Side note it too me a second to realize that I am using bcr and brnz is an alias for brc. It seems odd that there isn't a "speaking" alias for bcr (or maybe I'm just not finding it). You are right. There are no "speaking" aliases for brc. "Branch on Condition" is not used frequently in VM code. Therefore, nobody felt enough pain so far to define the aliases. May I please request you use the condition `Assembler::bcondNotZero`? `LTR` compares the register value against zero. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:41 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <64lENqcjoFCceNSLxNfmN6Kv3HNWundZqMHl-yvoSvk=.f02fb437-d92d-40ca-9da6-53c1e94df259@github.com> Message-ID: On Mon, 10 Oct 2022 09:18:47 GMT, Lutz Schmidt wrote: >> I changed to the preferred test instruction. However, it seems like z_brnz won't work in this situation because the branch address is in a register and not a label. >> >> Side note it too me a second to realize that I am using bcr and brnz is an alias for brc. It seems odd that there isn't a "speaking" alias for bcr (or maybe I'm just not finding it). > > You are right. There are no "speaking" aliases for brc. "Branch on Condition" is not used frequently in VM code. Therefore, nobody felt enough pain so far to define the aliases. > May I please request you use the condition `Assembler::bcondNotZero`? `LTR` compares the register value against zero. I'm happy to change bcondNotEqual -> bcondNotZero, especially since it seems to more clearly represent the intent. Thanks for the suggestion. Out of curiosity: How are they different? You mentioned above that there are semantic differences between not equal and not zero, but in assembler_s390.hpp, it looks like bcondNotZero is actually an alias for bcondNotEqual. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From lucy at openjdk.org Wed Oct 26 20:00:41 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: <64lENqcjoFCceNSLxNfmN6Kv3HNWundZqMHl-yvoSvk=.f02fb437-d92d-40ca-9da6-53c1e94df259@github.com> Message-ID: <5fug7V0nhfSPBIG3tAMTQHhGGEe1YTM8dWuL2ZjSVyM=.6c70e515-f3bb-4233-8dae-46bf0cd1a567@github.com> On Tue, 11 Oct 2022 15:27:44 GMT, Tyler Steele wrote: >> You are right. There are no "speaking" aliases for brc. "Branch on Condition" is not used frequently in VM code. Therefore, nobody felt enough pain so far to define the aliases. >> May I please request you use the condition `Assembler::bcondNotZero`? `LTR` compares the register value against zero. > > I'm happy to change bcondNotEqual -> bcondNotZero, especially since it seems to more clearly represent the intent. Thanks for the suggestion. > > Out of curiosity: How are they different? You mentioned above that there are semantic differences between not equal and not zero, but in assembler_s390.hpp, it looks like bcondNotZero is actually an alias for bcondNotEqual. You are right. The two conditions are technically identical. There is a semantic difference. If you say "NotEqual", you imply that you compared two (arbitrary) values. If you say "NotZero", it is clear that you tested the value against zero. When you use LTR to test a value and copy it at the same time, it may be confusing for the not so profound s390 hacker. Example: z_ltr(Z_R1, Z_R2); z_bcr(Assembler::bcondNotEqual, R14); Was the contents of the registers compared before r2 was copied into r1? For sure not, you say. But that's because you are a profound s390 hacker meanwhile. :-) ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Wed Oct 26 20:00:41 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 26 Oct 2022 20:00:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: <5fug7V0nhfSPBIG3tAMTQHhGGEe1YTM8dWuL2ZjSVyM=.6c70e515-f3bb-4233-8dae-46bf0cd1a567@github.com> References: <64lENqcjoFCceNSLxNfmN6Kv3HNWundZqMHl-yvoSvk=.f02fb437-d92d-40ca-9da6-53c1e94df259@github.com> <5fug7V0nhfSPBIG3tAMTQHhGGEe1YTM8dWuL2ZjSVyM=.6c70e515-f3bb-4233-8dae-46bf0cd1a567@github.com> Message-ID: On Tue, 11 Oct 2022 15:59:33 GMT, Lutz Schmidt wrote: >> I'm happy to change bcondNotEqual -> bcondNotZero, especially since it seems to more clearly represent the intent. Thanks for the suggestion. >> >> Out of curiosity: How are they different? You mentioned above that there are semantic differences between not equal and not zero, but in assembler_s390.hpp, it looks like bcondNotZero is actually an alias for bcondNotEqual. > > You are right. The two conditions are technically identical. There is a semantic difference. If you say "NotEqual", you imply that you compared two (arbitrary) values. If you say "NotZero", it is clear that you tested the value against zero. When you use LTR to test a value and copy it at the same time, it may be confusing for the not so profound s390 hacker. Example: > > z_ltr(Z_R1, Z_R2); > z_bcr(Assembler::bcondNotEqual, R14); > > Was the contents of the registers compared before r2 was copied into r1? For sure not, you say. But that's because you are a profound s390 hacker meanwhile. :-) I see the reasoning here; thanks for clarifying. Thanks also for your kind words. I'm not sure that I quite at the level of s390 hacker indicated above, but I'm working on it :-) ------------- PR: https://git.openjdk.org/jdk/pull/10558 From jnimeh at openjdk.org Wed Oct 26 20:48:24 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Wed, 26 Oct 2022 20:48:24 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Wed, 26 Oct 2022 15:47:08 GMT, vpaprotsk wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 171: >> >>> 169: } >>> 170: >>> 171: if (len >= 1024) { >> >> Out of curiosity, do you have any perf numbers for the impact of this change on systems that do not support AVX512? Does this help or hurt (or make a negligible impact) on poly1305 updates when the input is 1K or larger? > > (The first commit in this PR actually has the code without the check if anyone wants to measure.. well its also trivial to edit..) > > I measured about 50% slowdown on 64 byte payloads. One could argue that 64 bytes is not all that representative, but we don't get much out of assembler at that load either so it didn't seem worth it to figure out some sort of platform check. > > AVX512 needs at least 256 = 16 blocks.. there is overhead also pre-calculating powers of R that needs to be amortized. Assembler does fall back to 64-bit multiplies for <256, while the Java version will have to use the 32-bit multiplies. <256, purely scalar, non-vector, 64 vs 32 is not _that_ big an issue though; the algorithm is plenty happy with 26-bit limbs, and whatever the benefit of 64, it gets erased by the interface-matching code copying limbs in and out.. > > Right now, I measured 1k with `-XX:-UsePolyIntrinsics` to be about 10% slower. I think its acceptable, in order to get 18x? > > Most/all of the slowdown comes from this need of copying limbs out/in.. I am looking at perhaps copying limbs out in the intrinsic instead. Not very 'pretty'.. limbs are hidden in a nested private class behind an interface.. I would be breaking what is a good design with neat encapsulation. (I accidentally forced-pushed that earlier, if you are curious; non-working). The current version of this code seems more robust in the long term? 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic. I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From iveresov at openjdk.org Wed Oct 26 20:49:33 2022 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 26 Oct 2022 20:49:33 GMT Subject: RFR: 8295066: Folding of loads is broken in C2 after JDK-8242115 In-Reply-To: References: Message-ID: <3judUFx-evWUXwoahXsErqBiA8XwbQpBLuJQU4HqnSE=.a7e1d5e8-e18c-4d1e-9d8b-f69bfc6f045e@github.com> On Tue, 25 Oct 2022 19:50:10 GMT, Igor Veresov wrote: > The fix does two things: > > 1. Allow folding of pinned loads to constants with a straight line data flow (no phis). > 2. Make scalarization aware of the new shape of the barriers so that pre-loads can be ignored. > > Testing is clean, Valhalla testing is clean too. Thanks for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/10861 From iveresov at openjdk.org Wed Oct 26 20:49:34 2022 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 26 Oct 2022 20:49:34 GMT Subject: Integrated: 8295066: Folding of loads is broken in C2 after JDK-8242115 In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 19:50:10 GMT, Igor Veresov wrote: > The fix does two things: > > 1. Allow folding of pinned loads to constants with a straight line data flow (no phis). > 2. Make scalarization aware of the new shape of the barriers so that pre-loads can be ignored. > > Testing is clean, Valhalla testing is clean too. This pull request has now been integrated. Changeset: 58a7141a Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/58a7141a0dea5d1b4bfe6d56a95d860c854b3461 Stats: 260 lines in 9 files changed: 178 ins; 46 del; 36 mod 8295066: Folding of loads is broken in C2 after JDK-8242115 Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/10861 From mdoerr at openjdk.org Wed Oct 26 20:52:15 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 26 Oct 2022 20:52:15 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. LGTM. You may want to fix the comment, too. Please wait for the 2nd review before pushing. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2888: > 2886: > 2887: // Check return val of vm call > 2888: // if (return val != 0) Comment should also get updated. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From jvernee at openjdk.org Wed Oct 26 20:57:44 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 26 Oct 2022 20:57:44 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic I agree with David. Unconditionally doing a check on every call seems to be overkill, since it's a mostly theoretical problem at this point, and in general I think we should be able to assume that foreign code respects the ABI. There are other things that can go wrong as well, such as foreign code installing a signal handler, which can break implicit null checks. Other things like the foreign code returning with corrupted register state, which then leads to further corruption, is also a possibility. i.e. there seem to be many more things that can go wrong if we expect native code to violate the ABI. Even though the check can be pretty fast, we've seen that people watch the performance in this area closely, and care about every nanosecond spent here. On my own box, the `panama_blank` benchmark takes just 3.4ns, so the relative overhead could be larger depending on the machine, it seems. There was also recently a flag added to speed up native calls, namely `-XX:+UseSystemMemoryBarrier`. This could further make the relative overhead of a check larger. All in all, I think `-Xcheck:jni` is a better place to test this kind of stuff, and encourage people to run tests with `-Xcheck:jni` before deploying to production. But, at the same time, loading libraries is a known problematic situation, and there the performance matters far less. I'd say always checking and restoring the FPU control state, and perhaps emitting a warning message to spur people on to fix the issue in the long term, seems like a good solution to me. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From jnimeh at openjdk.org Wed Oct 26 21:15:25 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Wed, 26 Oct 2022 21:15:25 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Wed, 26 Oct 2022 20:45:57 GMT, Jamil Nimeh wrote: >> (The first commit in this PR actually has the code without the check if anyone wants to measure.. well its also trivial to edit..) >> >> I measured about 50% slowdown on 64 byte payloads. One could argue that 64 bytes is not all that representative, but we don't get much out of assembler at that load either so it didn't seem worth it to figure out some sort of platform check. >> >> AVX512 needs at least 256 = 16 blocks.. there is overhead also pre-calculating powers of R that needs to be amortized. Assembler does fall back to 64-bit multiplies for <256, while the Java version will have to use the 32-bit multiplies. <256, purely scalar, non-vector, 64 vs 32 is not _that_ big an issue though; the algorithm is plenty happy with 26-bit limbs, and whatever the benefit of 64, it gets erased by the interface-matching code copying limbs in and out.. >> >> Right now, I measured 1k with `-XX:-UsePolyIntrinsics` to be about 10% slower. I think its acceptable, in order to get 18x? >> >> Most/all of the slowdown comes from this need of copying limbs out/in.. I am looking at perhaps copying limbs out in the intrinsic instead. Not very 'pretty'.. limbs are hidden in a nested private class behind an interface.. I would be breaking what is a good design with neat encapsulation. (I accidentally forced-pushed that earlier, if you are curious; non-working). The current version of this code seems more robust in the long term? > > 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic. > > I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though. One small thing maybe: It doesn't look like R in `processMultipleBlocks` and `rbytes` ever changes, so maybe there's no need to repeatedly serialize/deserialize them on every call to engineUpdate? There is already an `r` that is attached to the object that is an IntegerModuloP. Could that be used in `processMultipleBlocks` and perhaps a private byte[] for a serialized r is also a field in Poly1305 that can be passed into the intrinsic method rather than creating it every time? It could be set in `setRSVals`. Perhaps we can recover a little performance there? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From lucy at openjdk.org Wed Oct 26 21:22:27 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 26 Oct 2022 21:22:27 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. LGTM as well, finally! Congratulations and thank you for your patience and perseverance. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From dholmes at openjdk.org Wed Oct 26 22:29:00 2022 From: dholmes at openjdk.org (David Holmes) Date: Wed, 26 Oct 2022 22:29:00 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v2] In-Reply-To: <1bfoEKP2K6f7eJ-FyU5uNN7L2TWWxUVX8U_x0FJitDY=.d7188708-562e-48f6-a18a-8d244d4dd42c@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <1bfoEKP2K6f7eJ-FyU5uNN7L2TWWxUVX8U_x0FJitDY=.d7188708-562e-48f6-a18a-8d244d4dd42c@github.com> Message-ID: On Wed, 26 Oct 2022 19:22:34 GMT, Matias Saavedra Silva wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 >> 10 invokedynamic bsm=31 13 >> BSM: REF_invokeStatic 32 >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Added null check and resource mark Updates look good. One further nit, but otherwise seems okay. But someone who is actively going to be using this needs to comment on the format/content changes. Thanks. src/hotspot/share/oops/cpCache.cpp line 653: > 651: bytecode_1() == Bytecodes::_invokedynamic)) { > 652: oop appendix = appendix_if_resolved(cph); > 653: if (m != NULL) { Please use nullptr to be consistent with earlier null check. ------------- PR: https://git.openjdk.org/jdk/pull/10860 From dzhang at openjdk.org Thu Oct 27 05:43:28 2022 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 27 Oct 2022 05:43:28 GMT Subject: RFR: 8295967: RISC-V: Support negVI/negVL instructions for Vector API Message-ID: Hi, This patch will add support of `NegVI`, `NegVL` for RISC-V and was implemented by referring to riscv-v-spec v1.0 [1]. Tests are performed on qemu with parameter `-cpu rv64,v=true,vlen=256,vext_spec=v1.0`. By adding the `-XX:+PrintAssembly -Xcomp -XX:-TieredCompilation -XX:+LogCompilation -XX:LogFile=compile.log` parameter when executing the test cases[2] [3] , the compilation log is as follows: 100 B16: # out( B37 B17 ) <- in( B15 ) Freq: 77.0109 100 # castII of R9, #@castII 100 addw R29, R9, zr #@convI2L_reg_reg 104 slli R29, R29, (#2 & 0x3f) #@lShiftL_reg_imm 108 add R12, R30, R29 # ptr, #@addP_reg_reg 10c addi R12, R12, #16 # ptr, #@addP_reg_imm 110 vle V1, [R12] #@loadV 118 vrsub.vx V1, V1, V1 #@vnegI 120 bgeu R9, R10, B37 #@cmpU_branch P=0.000001 C=-1.000000 At the same time, the following assembly code will be generated: 0x000000400ccfa618: .4byte 0x10072d7 0x000000400ccfa61c: .4byte 0xe1040d7 ;*invokestatic unaryOp {reexecute=0 rethrow=0 return_oop=0} ; - jdk.incubator.vector.IntVector::lanewiseTemplate at 91 (line 684) ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 273) ; - jdk.incubator.vector.Int256Vector::lanewise at 2 (line 41) ; - Int256VectorTests::NEGInt256VectorTests at 73 (line 5216) PS: `0x10072d7/0xe1040d7` are the machine code for `vsetvli/vrsub`. After we implement these nodes, by using `-XX:+UseRVV`, the number of assembly instructions is reduced by about ~50% because of the different execution paths with the number of loops, similar to `AddTest` [4]. In the meantime, I also add an assembly pseudoinstruction `vneg.v` in macroAssembler_riscv. [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#111-vector-single-width-integer-add-and-subtract [2] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java [3] https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector/Long256VectorTests.java [4] https://github.com/zifeihan/vector-api-test-rvv/blob/master/vector-api-rvv-performance.md Please take a look and have some reviews. Thanks a lot. ## Testing: - hotspot and jdk tier1 on unmatched board without new failures - test/jdk/jdk/incubator/vector/Int256VectorTests.java with fastdebug on qemu - test/jdk/jdk/incubator/vector/Long256VectorTests.java with fastdebug on qemu ------------- Commit messages: - Add vnegI/vnegL C2 instructions for Vector api Changes: https://git.openjdk.org/jdk/pull/10880/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10880&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295967 Stats: 29 lines in 3 files changed: 29 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10880.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10880/head:pull/10880 PR: https://git.openjdk.org/jdk/pull/10880 From eosterlund at openjdk.org Thu Oct 27 05:46:40 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 27 Oct 2022 05:46:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: <0zZTLgMTZydV7hKs0PrllytwHZ-6vLvkNi7L2sgWgJs=.e6d801ef-fa8a-4c9e-a7f7-658f90f3331b@github.com> On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Looks like you missed implementing the c2i entry barrier that comes hand in hand with the nmethod entry barriers. Once you implemented that it looks like you need to spill more registers in g1_write_barrier_pre() in g1BarrierSetAssembler_s390.cpp with this change as well, as all java arguments are live at that point and with G1 the phantom load of the method holder will call the pre_write barrier which occasionally will call the runtime. At that point, even floating point arguments must be saved. The current s390 g1 pre write barrier seems to assume it is only called from the interpreter, I believe. All other platforms went through the same dance when adding nmethod entry barriers. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From iklam at openjdk.org Thu Oct 27 05:59:19 2022 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 27 Oct 2022 05:59:19 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v2] In-Reply-To: <1bfoEKP2K6f7eJ-FyU5uNN7L2TWWxUVX8U_x0FJitDY=.d7188708-562e-48f6-a18a-8d244d4dd42c@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> <1bfoEKP2K6f7eJ-FyU5uNN7L2TWWxUVX8U_x0FJitDY=.d7188708-562e-48f6-a18a-8d244d4dd42c@github.com> Message-ID: On Wed, 26 Oct 2022 19:22:34 GMT, Matias Saavedra Silva wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 >> 10 invokedynamic bsm=31 13 >> BSM: REF_invokeStatic 32 >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Added null check and resource mark For testing the handling of uninitialized ConstantPoolEntry's, you can do something like this: - set a breakpoint at InstanceKlass::initialize_impl in gdb - when the breakpoint is hit, check if constants()->cache() is NULL - if not NULL, do this: `call constants()->cache()->print_on(tty)` This should catch the problem in your earlier commit where you didn't check for the NULL Method pointer. ------------- PR: https://git.openjdk.org/jdk/pull/10860 From rehn at openjdk.org Thu Oct 27 07:16:24 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 27 Oct 2022 07:16:24 GMT Subject: RFR: 8233697: CHT: Iteration parallelization [v5] In-Reply-To: References: <5kEWbR4jwntZSgR0Y-xBfJ_rxSWnHntQe9ixuKVDANU=.ee848728-217e-4300-888b-8070ffe2d276@github.com> Message-ID: On Wed, 26 Oct 2022 16:27:31 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to add parallel iteration of the ConcurrentHashTable. The iteration should be done during a safepoint without concurrent modifications to the ConcurrentHashTable. >> >> Usecase is in parallelizing the merging of large remsets for G1. >> >> Some background: The problem is that particularly during (G1) mixed gc it happens that the distribution of contents in the CHT is very unbalanced - young gen regions have a very small remembered set (little work), and old gen regions very large ones (much work). >> >> Since the current work distribution is based on whole remembered sets (i.e. CHTs), this makes for a very unbalanced merge remsets phase in G1 when you have quite a bit more than the number of old gen regions threads at your disposal. >> This negatively impacts pause time predictions (and obviously pause times are longer than necessary as many threads are idling to wait for the phase to complete). >> >> This change only adds the infrastructure code in the CHT, there will be a follow-up with G1 changes. >> >> Testing: tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Robbin suggestion to use BucketsOperation Looks good, thanks! ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10759 From jbhateja at openjdk.org Thu Oct 27 09:39:32 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Oct 2022 09:39:32 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. >> - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. >> - Added a JMH perf test. >> - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. >> >> Perf before: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s >> >> and after: >> >> Benchmark (dataSize) (provider) Mode Cnt Score Error Units >> Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s >> Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s >> Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s >> Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s >> Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s > > vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: > > extra whitespace character Few other non-algorithm change set comments. src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 22: > 20: * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA > 21: * or visit www.oracle.com if you need additional information or have any > 22: * questions. Of late stub code has been re-organized, to comply with it you may want to remove this file and merge macro-assembly code into a new file stubGenerator_x86_64_poly.cpp on the lines of src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 849: > 847: jcc(Assembler::less, L_process16Loop); > 848: > 849: poly1305_process_blocks_avx512(input, length, Since entire code is based on 512 bit encoding misalignment penalty may be costly here. A scalar peel handling (as done in tail) for input portion before a 64 byte aligned address could further improve the performance for large block sizes. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2040: > 2038: > 2039: address StubGenerator::generate_poly1305_processBlocks() { > 2040: __ align64(); This can be replaced by __ align(CodeEntryAlignment); src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead > 174: // and not affect platforms without intrinsic support > 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; Since Poly processes 16 byte chunks, a strength reduced version of above expression could be len & (~(BLOCK_LEN-1) test/micro/org/openjdk/bench/javax/crypto/full/Poly1305DigestBench.java line 94: > 92: throw new RuntimeException(ex); > 93: } > 94: } On CLX patch shows performance regression of about 10% for block size 1024-2048+. CLX (Non-IFMA target) Baseline (JDK-20):- Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.digest 64 thrpt 2 3128928.978 ops/s Poly1305DigestBench.digest 256 thrpt 2 1526452.083 ops/s Poly1305DigestBench.digest 1024 thrpt 2 509267.401 ops/s Poly1305DigestBench.digest 2048 thrpt 2 305784.922 ops/s Poly1305DigestBench.digest 4096 thrpt 2 142175.885 ops/s Poly1305DigestBench.digest 8192 thrpt 2 72142.906 ops/s Poly1305DigestBench.digest 16384 thrpt 2 36357.000 ops/s Poly1305DigestBench.digest 1048576 thrpt 2 676.142 ops/s Withopt: Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.digest 64 thrpt 2 3136204.416 ops/s Poly1305DigestBench.digest 256 thrpt 2 1683221.124 ops/s Poly1305DigestBench.digest 1024 thrpt 2 457432.172 ops/s Poly1305DigestBench.digest 2048 thrpt 2 277563.817 ops/s Poly1305DigestBench.digest 4096 thrpt 2 149393.357 ops/s Poly1305DigestBench.digest 8192 thrpt 2 79463.734 ops/s Poly1305DigestBench.digest 16384 thrpt 2 41083.730 ops/s Poly1305DigestBench.digest 1048576 thrpt 2 705.419 ops/s ------------- PR: https://git.openjdk.org/jdk/pull/10582 From jbhateja at openjdk.org Thu Oct 27 09:39:33 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 27 Oct 2022 09:39:33 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Wed, 26 Oct 2022 21:11:33 GMT, Jamil Nimeh wrote: >> 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic. >> >> I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though. > > One small thing maybe: It doesn't look like R in `processMultipleBlocks` and `rbytes` ever changes, so maybe there's no need to repeatedly serialize/deserialize them on every call to engineUpdate? There is already an `r` that is attached to the object that is an IntegerModuloP. Could that be used in `processMultipleBlocks` and perhaps a private byte[] for a serialized r is also a field in Poly1305 that can be passed into the intrinsic method rather than creating it every time? It could be set in `setRSVals`. Perhaps we can recover a little performance there? > 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic. > > I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though. Do you suggest using white box APIs for CPU feature query during poly static initialization and perform multi block partitioning only for relevant platforms and keep the original implementation sacrosanct for other targets. VM does offer native white box primitives and currently its being used by tests infrastructure. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From jsjolen at openjdk.org Thu Oct 27 10:32:38 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 27 Oct 2022 10:32:38 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v4] In-Reply-To: References: Message-ID: > Hi! > > This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Remove WizardMode && Verbose as per Coleen, add back WizardMode as per DHolmes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10645/files - new: https://git.openjdk.org/jdk/pull/10645/files/5b9f53d3..b375c9f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10645&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10645.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10645/head:pull/10645 PR: https://git.openjdk.org/jdk/pull/10645 From rkennke at openjdk.org Thu Oct 27 10:58:12 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 27 Oct 2022 10:58:12 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: > There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: > - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. > - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. > > Testing: > - [x] GHA (x86 and x-compile failures look like infra glitch) > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix has_owner() condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10849/files - new: https://git.openjdk.org/jdk/pull/10849/files/04c780d0..37fa31bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10849&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10849&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10849.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10849/head:pull/10849 PR: https://git.openjdk.org/jdk/pull/10849 From rkennke at openjdk.org Thu Oct 27 10:58:13 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 27 Oct 2022 10:58:13 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Wed, 26 Oct 2022 19:05:50 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix has_owner() condition > > src/hotspot/share/runtime/objectMonitor.inline.hpp line 62: > >> 60: void* owner = owner_raw(); >> 61: return owner != NULL || owner == DEFLATER_MARKER; >> 62: } > > Why does has_owner() return `true` when `owner == DEFLATER_MARKER`? > I'm only seeing one caller to the new `has_owner()` function in > `ThreadService::find_deadlocks_at_safepoint()` and I don't understand why > that code needs to think `has_owner()` needs to be `true` if the target > ObjectMonitor is being deflated. > > That new `has_owner()` call will result in calling `Threads::owning_thread_from_monitor()` > with `waitingToLockMonitor` which is being deflated. So the return from > `Threads::owning_thread_from_monitor()` will be `NULL` which will result > in us taking the `num_deadlocks++` code path. If I'm reading this right, then > we'll report a deflating monitor as being in a deadlock. What am I missing here? Right, good catch. I got the condition the wrong way. I just pushed a fix. ------------- PR: https://git.openjdk.org/jdk/pull/10849 From mdoerr at openjdk.org Thu Oct 27 11:05:33 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Oct 2022 11:05:33 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Good catch! Is the c2i entry barrier really required without concurrent class unloading? ------------- PR: https://git.openjdk.org/jdk/pull/10558 From jsjolen at openjdk.org Thu Oct 27 11:47:29 2022 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 27 Oct 2022 11:47:29 GMT Subject: RFR: 8295060: Port PrintDeoptimizationDetails to UL [v4] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 10:32:38 GMT, Johan Sj?len wrote: >> Hi! >> >> This PR ports PrintDeoptimizationDetails to UL by mapping its output to debug level with tag deoptimization. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Remove WizardMode && Verbose as per Coleen, add back WizardMode as > > per DHolmes I made a new comparison and found this: - UL adds a line, which comes from this PR: https://github.com/openjdk/jdk/pull/8812 - UL has the precise description of locals missing, looks like this in PrintDeoptimizationDetails: locals: 0 "true"{0x000000011d00cd40} <0x000000011d00cd40> 1 NULL <0x0000000000000000> 2 0 (int) 0,000000 (float) 0 (hex) expressions: 0 NULL <0x0000000000000000> No clue why this is missing so far, looking into it. ------------- PR: https://git.openjdk.org/jdk/pull/10645 From tsteele at openjdk.org Thu Oct 27 14:15:39 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 14:15:39 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v2] In-Reply-To: References: Message-ID: <-0WuSW5EvM4cHDwBiIamjC3ltdYZzDNWgrTgwfQmfcg=.22c07703-48a3-444e-8d9f-ed119a35523e@github.com> > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Fixup comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/0a683eee..49e701d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=00-01 Stats: 7 lines in 2 files changed: 2 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 14:15:40 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 14:15:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v2] In-Reply-To: References: Message-ID: <_XLIVYTdoWhVNuTrh4OQaz_aGWROk9xUsDL7HytPsPY=.b5be873c-44aa-4dae-9958-91dd8154691c@github.com> On Wed, 26 Oct 2022 20:48:32 GMT, Martin Doerr wrote: >> Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixup comments > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 2888: > >> 2886: >> 2887: // Check return val of vm call >> 2888: // if (return val != 0) > > Comment should also get updated. Agreed. This change is complete. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From matsaave at openjdk.org Thu Oct 27 14:31:43 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 27 Oct 2022 14:31:43 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v3] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: changed NULL to nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/f007d46b..452ef598 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From coleenp at openjdk.org Thu Oct 27 15:01:28 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 27 Oct 2022 15:01:28 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags Message-ID: I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. Tested with tier1-4. ------------- Commit messages: - 8295964: Move InstanceKlass::_misc_flags Changes: https://git.openjdk.org/jdk/pull/10249/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10249&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295964 Stats: 352 lines in 9 files changed: 156 ins; 160 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/10249.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10249/head:pull/10249 PR: https://git.openjdk.org/jdk/pull/10249 From luhenry at openjdk.org Thu Oct 27 15:31:05 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 27 Oct 2022 15:31:05 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V Message-ID: The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. ------------- Commit messages: - 8295948: Support for Zicbop/prefetch instructions on RISC-V Changes: https://git.openjdk.org/jdk/pull/10884/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10884&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295948 Stats: 134 lines in 7 files changed: 130 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10884.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10884/head:pull/10884 PR: https://git.openjdk.org/jdk/pull/10884 From aph at openjdk.org Thu Oct 27 15:34:29 2022 From: aph at openjdk.org (Andrew Haley) Date: Thu, 27 Oct 2022 15:34:29 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Thanks everyone for your contributions. I think we have a good solution: 1. Add my "fast" alternative RestoreMXCSR code, and use =1 and =2 levels of checking, as suggested by John Rose. Don't enable RestoreMXCSR by default: it should be 0 (no check). 2. Warn and restore the MXCSR at safepoints. This could be a hard error rather than a warning; not sure. At least we wouldn't have silent corruption. Everyone happy? ------------- PR: https://git.openjdk.org/jdk/pull/10661 From tsteele at openjdk.org Thu Oct 27 16:23:37 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 16:23:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v3] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Moves nm_entry_barrier implementation. Adds c2i_entry_barrier & resolve_weak_handle stub ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/49e701d9..eb3a1df9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=01-02 Stats: 98 lines in 4 files changed: 73 ins; 25 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 16:32:37 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 16:32:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v4] In-Reply-To: References: Message-ID: <-myYF-ZJYWGORQ8yU5skIJFlP9DQuyHZC3UUfv-ZwqM=.7b113ea4-31a4-406a-9cdf-2d4e5c866dd7@github.com> > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Change PreservationLevel -> DecoratorSet in sig of resolve_weak_handle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/eb3a1df9..1b192d2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 16:35:41 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 16:35:41 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v5] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Add Z missing z_'s ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/1b192d2d..4a403b48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=03-04 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 16:38:34 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 16:38:34 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v6] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Add missing imports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/4a403b48..ae6c4c34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=04-05 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 17:01:43 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 17:01:43 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v7] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Add impl for resolve_weak_handle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/ae6c4c34..32aa96cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=05-06 Stats: 48 lines in 3 files changed: 24 ins; 19 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 17:11:40 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 17:11:40 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v8] In-Reply-To: References: Message-ID: <_ffBZec1_IAaYoYUw_W_fTx0ZrjcYC3eiPdSdPUsQSA=.40b0c586-fdf9-44c7-8068-4e74ed3aa9cf@github.com> > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Add missing scope to resolve_weak_handle impl ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/32aa96cd..b044abc7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From dcubed at openjdk.org Thu Oct 27 17:34:28 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 27 Oct 2022 17:34:28 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Thu, 27 Oct 2022 10:58:12 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_owner() condition Marked as reviewed by dcubed (Reviewer). src/hotspot/share/prims/jvmtiEnvBase.cpp line 1410: > 1408: if (mark.has_monitor()) { > 1409: mon = mark.monitor(); > 1410: assert(mon != NULL, "must have monitor"); The original code does not have this `assert()`, but I'm okay with this. src/hotspot/share/prims/jvmtiEnvBase.cpp line 1422: > 1420: // This monitor is owned so we have to find the owning JavaThread. > 1421: owning_thread = Threads::owning_thread_from_monitor_owner(tlh.list(), owner); > 1422: assert(owning_thread != NULL, "owning JavaThread must not be NULL"); I finished doing an equivalence analysis of this code that has been replaced with the code in `ObjectSynchronizer::get_lock_owner()`. This `assert()` on L1422 is the only thing I found that is "missing". However, I don't think that you can add back that `assert()` call in this function. The reason that the `assert()` is okay in the original code is because it is "protected" by this line: L1419 if (owner != NULL) { and while that check is also done in `ObjectSynchronizer::get_lock_owner()` on L1032, it DOES NOT `assert()` that the return from `Threads::owning_thread_from_monitor_owner()` is not NULL and in fact has a comment that says: // owning_thread_from_monitor_owner() may also return NULL here If memory serves, `ObjectSynchronizer::get_lock_owner()` can be called from locations where we have non-NULL `owner`, but when we try to find the owning thread, we can sometimes get a NULL back. src/hotspot/share/runtime/objectMonitor.inline.hpp line 61: > 59: inline bool ObjectMonitor::has_owner() const { > 60: void* owner = owner_raw(); > 61: return owner != NULL && owner != DEFLATER_MARKER; You could also do: return owner() != nullptr; and take advantage of the fact that `owner()` filters out DEFLATER_MARKER for you. ------------- PR: https://git.openjdk.org/jdk/pull/10849 From tsteele at openjdk.org Thu Oct 27 18:21:10 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 18:21:10 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v9] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Fixup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/b044abc7..a8bb7a06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=07-08 Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 18:24:11 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 18:24:11 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v10] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: - Fixup - Fixup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/a8bb7a06..80804bec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From dcubed at openjdk.org Thu Oct 27 18:36:29 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 27 Oct 2022 18:36:29 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Thu, 27 Oct 2022 10:58:12 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_owner() condition Definitely want someone from Serviceability to take a look at the changes in: - src/hotspot/share/prims/jvmtiEnvBase.cpp - src/hotspot/share/services/threadService.cpp Ping @plummercj or @sspitsyn... ------------- PR: https://git.openjdk.org/jdk/pull/10849 From dcubed at openjdk.org Thu Oct 27 19:57:37 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 27 Oct 2022 19:57:37 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v7] In-Reply-To: References: Message-ID: <_4pEeoarSDmRHeH3FOYwQz8RHokONrWwGdNIxv7Kpjo=.d82f866d-abd6-45b4-b7b0-9bd27a06294f@github.com> On Mon, 24 Oct 2022 08:03:13 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) >> - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: > > - Merge remote-tracking branch 'upstream/master' into fast-locking > - More RISC-V fixes > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - RISC-V port > - Revert "Re-use r0 in call to unlock_object()" > > This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Fix number of rt args to complete_monitor_locking_C, remove some comments > - Re-use r0 in call to unlock_object() > - Merge tag 'jdk-20+17' into fast-locking > > Added tag jdk-20+17 for changeset 79ccc791 > - Fix OSR packing in AArch64, part 2 > - ... and 25 more: https://git.openjdk.org/jdk/compare/65c84e0c...a67eb95e This PR has been in "merge-conflict" state for about 10 days. When do you plan to merge again with the jdk/jdk repo? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From tsteele at openjdk.org Thu Oct 27 20:37:50 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 20:37:50 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v11] In-Reply-To: References: Message-ID: <71p2Q3jBkaIB0A0UK58eQqb-BGpnEgFCOuY56TuSEkE=.03c107da-5b65-4679-b689-660b67dc9a6e@github.com> > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Add extra spill requested by fisk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/80804bec..1e1ffe34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=09-10 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 20:40:52 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 20:40:52 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v12] In-Reply-To: References: Message-ID: <0_zSfbD9mO_XpeH1wk7NnnMhvAJMLg-pS7s_UIJyqnw=.1e0ab631-c053-4f20-8d27-cfa221a83e26@github.com> > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: - Fixup - Fixup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/1e1ffe34..0647d13a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=10-11 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From jrose at openjdk.org Thu Oct 27 20:41:44 2022 From: jrose at openjdk.org (John R Rose) Date: Thu, 27 Oct 2022 20:41:44 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v7] In-Reply-To: References: Message-ID: On Mon, 24 Oct 2022 11:01:01 GMT, Robbin Ehn wrote: > Secondly, a question/suggestion: Many recursive cases do not interleave locks, meaning the recursive enter will happen with the lock/oop top of lock stack already. Why not peak at top lock/oop in lock-stack if the is current just push it again and the locking is done? (instead of inflating) (exit would need to check if this is the last one and then proper exit) The CJM paper (Dice/Kogan 2021) mentions a "nesting" counter for this purpose. I suspect that a real counter is overkill, and the "unary" representation Robbin mentions would be fine, especially if there were a point (when the per-thread stack gets too big) at which we go and inflate anyway. The CJM paper suggests a full search of the per-thread array to detect the recursive condition, but again I like Robbin's idea of checking only the most recent lock record. So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example. And there could be a depth limit as well. Any sequence of held locks not expressible within those limitations could go to inflation as a backup. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From mcimadamore at openjdk.org Thu Oct 27 21:00:07 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 27 Oct 2022 21:00:07 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) Message-ID: This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. [1] - https://openjdk.org/jeps/434 ------------- Commit messages: - Merge branch 'master' into PR_20 - Merge pull request #14 from minborg/small-javadoc - Update some javadocs - Revert some javadoc changes - Merge branch 'master' into PR_20 - Fix benchmark and test failure - Merge pull request #13 from minborg/revert-factories - Update javadocs after comments - Revert MemorySegment factories - Merge pull request #12 from minborg/fix-lookup-find - ... and 6 more: https://git.openjdk.org/jdk/compare/78454b69...ac7733da Changes: https://git.openjdk.org/jdk/pull/10872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295044 Stats: 10527 lines in 200 files changed: 4754 ins; 3539 del; 2234 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Thu Oct 27 21:00:07 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 27 Oct 2022 21:00:07 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) In-Reply-To: References: Message-ID: On Wed, 26 Oct 2022 13:11:50 GMT, Maurizio Cimadamore wrote: > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Here are the main API changes introduced in this round (there are also some JVM changes which will be integrated separately): * The main change is the removal of `MemoryAddress` and `Addressable`. Instead, *zero-length memory segments* are used whenever the API needs to model "raw" addresses coming from native code. This simplifies the API, removing an ambiguous abstraction as well as some duplication in the API (see accessor methods in `MemoryAddress`); * To allow for "unsafe" access of zero-length memory segments, a new method has been added to `ValueLayout.OfAddress`, namely `asUnbounded`. This new restricted method takes an address layout and creates a new unbounded address layout. When using an unbounded layout to dereference memory, or construct downcall method handles, the API will create memory segments with maximal length (i.e. `Long.MAX_VALUE`, rather than zero-length memory segments, which can therefore be accessed; * The `MemoryLayout` hierarchy has been improved in several ways. First, the hierarchy is now defined in terms of sealed interfaces (intermediate abstract classes have been moved into the implementation package). The hierarchy is also exhaustive now, and works much better to pattern matching. More specifically, three new types have been added: `PaddingLayout`, `StructLayout` and `UnionLayout`, the latter two are a subtype of `GroupLayout`. Thanks to this move, several predicate methods (`isPadding`, `isStruct`, `isUnion`) have been dropped from the API; * The `SymbolLookup::lookup` method has been renamed to `SymbolLookup::find` - to avoid using the same word `lookup` in both noun and verb form, which leads to confusion; * A new method, on `ModuleLayer.Controller` has been added to enable native access on a module in a custom layer; * The new interface `Linker.Option` has been introduced. This is a tag interface accepted in `Linker::downcallHandle`. At the moment, only a single option is provided, to specify variadic function calls (because of this, the `FunctionDescriptor` interface has been simplified, and is now a simple carrier of arguments/return layouts). More linker options will follow. Javadoc: http://cr.openjdk.java.net/~mcimadamore/jdk/8295044/v1/javadoc/java.base/java/lang/foreign/package-summary.html ------------- PR: https://git.openjdk.org/jdk/pull/10872 From forax at univ-mlv.fr Thu Oct 27 21:03:53 2022 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 27 Oct 2022 23:03:53 +0200 (CEST) Subject: RFR: 8291555: Replace stack-locking with fast-locking [v7] In-Reply-To: References: Message-ID: <231161996.35475533.1666904633911.JavaMail.zimbra@u-pem.fr> ----- Original Message ----- > From: "John R Rose" > To: hotspot-dev at openjdk.org, serviceability-dev at openjdk.org, shenandoah-dev at openjdk.org > Sent: Thursday, October 27, 2022 10:41:44 PM > Subject: Re: RFR: 8291555: Replace stack-locking with fast-locking [v7] > On Mon, 24 Oct 2022 11:01:01 GMT, Robbin Ehn wrote: > >> Secondly, a question/suggestion: Many recursive cases do not interleave locks, >> meaning the recursive enter will happen with the lock/oop top of lock stack >> already. Why not peak at top lock/oop in lock-stack if the is current just push >> it again and the locking is done? (instead of inflating) (exit would need to >> check if this is the last one and then proper exit) > > The CJM paper (Dice/Kogan 2021) mentions a "nesting" counter for this purpose. > I suspect that a real counter is overkill, and the "unary" representation > Robbin mentions would be fine, especially if there were a point (when the > per-thread stack gets too big) at which we go and inflate anyway. > > The CJM paper suggests a full search of the per-thread array to detect the > recursive condition, but again I like Robbin's idea of checking only the most > recent lock record. > > So the data structure for lock records (per thread) could consist of a series of > distinct values [ A B C ] and each of the values could be repeated, but only > adjacently: [ A A A B C C ] for example. And there could be a depth limit as > well. Any sequence of held locks not expressible within those limitations > could go to inflation as a backup. Hi John, a certainly stupid question, i've some trouble to see how it can be implemented given that because of lock coarsening (+ may be OSR), the number of time a lock is held is different between the interpreted code and the compiled code. R?mi > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10590 From tsteele at openjdk.org Thu Oct 27 21:06:37 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 21:06:37 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Clean up comments. Adjust copyright years. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/0647d13a..a744ce34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=11-12 Stats: 3 lines in 3 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From jnimeh at openjdk.org Thu Oct 27 21:21:33 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Thu, 27 Oct 2022 21:21:33 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Thu, 27 Oct 2022 09:22:03 GMT, Jatin Bhateja wrote: >> One small thing maybe: It doesn't look like R in `processMultipleBlocks` and `rbytes` ever changes, so maybe there's no need to repeatedly serialize/deserialize them on every call to engineUpdate? There is already an `r` that is attached to the object that is an IntegerModuloP. Could that be used in `processMultipleBlocks` and perhaps a private byte[] for a serialized r is also a field in Poly1305 that can be passed into the intrinsic method rather than creating it every time? It could be set in `setRSVals`. Perhaps we can recover a little performance there? > >> 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic. >> >> I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though. > > Do you suggest using white box APIs for CPU feature query during poly static initialization and perform multi block processing only for relevant platforms and keep the original implementation sacrosanct for other targets. VM does offer native white box primitives and currently its being used by tests infrastructure. No, going the WhiteBox route was not something I was thinking of. I sought feedback from a couple hotspot-knowledgable people about the use of WhiteBox APIs and both felt that it was not the right way to go. One said that WhiteBox is really for VM testing and not for these kinds of java classes. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From mdoerr at openjdk.org Thu Oct 27 21:38:54 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Oct 2022 21:38:54 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 21:06:37 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Clean up comments. Adjust copyright years. I have to revoke my review. This does no longer look correct. src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 218: > 216: > 217: // TODO: GPRS: 5 + FPRS: 8 + Flags:1? > 218: const int nbytes_save = (5 + 8 + 1) * BytesPerWord; Flags are not needed. src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 243: > 241: __ save_return_pc(); > 242: __ push_frame_abi160(nbytes_save); // Will use Z_R0 as tmp. > 243: __ save_volatile_regs(Z_SP, frame::z_abi_160_size, true, true); Adds overhead for other usages which don't need it. May be still ok, though. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 169: > 167: // Load class loader data to determine whether the method's holder is concurrently unloading. > 168: __ load_method_holder(Z_R0_scratch, Z_method); > 169: __ z_lg(Z_R0_scratch, in_bytes(InstanceKlass::class_loader_data_offset()), Z_R0_scratch); I don't think R0 can be used for storage addressing. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 178: > 176: // Class loader is weak. Determine whether the holder is still alive. > 177: __ z_lg(Z_R1_scratch, in_bytes(ClassLoaderData::holder_offset()), Z_R0_scratch); > 178: __ resolve_weak_handle(Address(Z_R1_scratch), Z_R1_scratch, Z_R0_scratch, Z_R2); You're killing R2 which contains the 1st argument. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 22:21:35 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 22:21:35 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 21:25:53 GMT, Martin Doerr wrote: >> Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up comments. Adjust copyright years. > > src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 218: > >> 216: >> 217: // TODO: GPRS: 5 + FPRS: 8 + Flags:1? >> 218: const int nbytes_save = (5 + 8 + 1) * BytesPerWord; > > Flags are not needed. I thought you might say that. This has been corrected. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 22:24:43 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 22:24:43 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: <2kja_nAfdhj3VK_6NHKbEZPUWHDUQcuzqAC68Z2c9Yo=.ca3d0f57-e305-49f3-8dc7-3b0fea69e698@github.com> On Thu, 27 Oct 2022 21:06:37 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Clean up comments. Adjust copyright years. Written earlier; tests just completed. I made some changes: - I agree that nmethod_entry_barrier was in a silly place, so I moved it to a better one. This change won't need to be reviewed. - I have added the c2i entry barrier code which will need to be reviewed. I believe any of my reviewers would be able to do this. - I have added a first pass for the changes requested to `g1_write_barrier_pre()`. I am not totally clear on what needs to happen, so there will probably be more to do here. Please take a look @fisk. I've run the tests in `test/hotspot/jtreg/gc` again. They are passing, but since they passed before the change I don't believe this tells up much. Is there a better suite to run, or is this functionality not covered by any test suite? I imagine getting the GC to a particular state, then triggering a usually automatic action would be very difficult in many cases. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 22:32:32 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 22:32:32 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 21:33:28 GMT, Martin Doerr wrote: >> Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up comments. Adjust copyright years. > > src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 243: > >> 241: __ save_return_pc(); >> 242: __ push_frame_abi160(nbytes_save); // Will use Z_R0 as tmp. >> 243: __ save_volatile_regs(Z_SP, frame::z_abi_160_size, true, true); > > Adds overhead for other usages which don't need it. May be still ok, though. Agreed. The PPC implementation I looked at has more complicated logic for determining if a frame is needed, but it relies on a PreservationLevel flag which is not present on s390. If there is another way to deduce whether a frame is required here, or if we should add that flag on s390, I am happy to make that change as well. > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 178: > >> 176: // Class loader is weak. Determine whether the holder is still alive. >> 177: __ z_lg(Z_R1_scratch, in_bytes(ClassLoaderData::holder_offset()), Z_R0_scratch); >> 178: __ resolve_weak_handle(Address(Z_R1_scratch), Z_R1_scratch, Z_R0_scratch, Z_R2); > > You're killing R2 which contains the 1st argument. I also thought this might draw some attention. Is there a better register to use (R7?), or will I simply have to save & restore the value of whatever register I choose? ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 23:21:08 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 23:21:08 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 21:35:41 GMT, Martin Doerr wrote: > I have to revoke my review. This does no longer look correct. Noted. Since the platform is building again, it may be a good idea to follow your suggestion of pushing the changes up to your review, and then create a separate PR for the new changes. I suppose this depends on whether @fisk believes the previous changes (up to the time of their first comment)[[1]](https://github.com/openjdk/jdk/pull/10558#pullrequestreview-1157627111) to be complete. > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 169: > >> 167: // Load class loader data to determine whether the method's holder is concurrently unloading. >> 168: __ load_method_holder(Z_R0_scratch, Z_method); >> 169: __ z_lg(Z_R0_scratch, in_bytes(InstanceKlass::class_loader_data_offset()), Z_R0_scratch); > > I don't think R0 can be used for storage addressing. Those cases where 'if you use R0 it really means 0 and not the contents of the register' are a real gotcha. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Thu Oct 27 23:46:43 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 27 Oct 2022 23:46:43 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v14] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Changes - Adds labels to registers used in c2i_entry_barrier - Removes use of R2 - Removes erroneous uses of R0 - Adds nm->c2i_entry_barrier in gen_i2c2i_adapters ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/a744ce34..72382f7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=12-13 Stats: 23 lines in 3 files changed: 9 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From sspitsyn at openjdk.org Fri Oct 28 00:04:27 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Oct 2022 00:04:27 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: <3-a1_O36gyjHVOmsO_Mn_vUg7xFVX6BlFa2drxzK_24=.bcd3a5ef-8b0f-4d6c-8728-091331950909@github.com> On Thu, 27 Oct 2022 10:58:12 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_owner() condition Dan, I'm already reviewing this. ------------- PR: https://git.openjdk.org/jdk/pull/10849 From sspitsyn at openjdk.org Fri Oct 28 00:20:26 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Oct 2022 00:20:26 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Thu, 27 Oct 2022 10:58:12 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_owner() condition This is nice simplification. It looks pretty good. What JVMTI and JDI tests were run to verify the fix? Thanks, Serguei ------------- PR: https://git.openjdk.org/jdk/pull/10849 From sspitsyn at openjdk.org Fri Oct 28 00:23:32 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Oct 2022 00:23:32 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Thu, 27 Oct 2022 10:58:12 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_owner() condition Marked as reviewed by sspitsyn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10849 From cjplummer at openjdk.org Fri Oct 28 00:55:44 2022 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 28 Oct 2022 00:55:44 GMT Subject: RFR: 8294321: Fix typos in files under test/jdk/java, test/jdk/jdk, test/jdk/jni [v2] In-Reply-To: <-0yo8KceENmJ48YPNoHCUkx_iEWpIE0mPJn_-BkjbWY=.76a8dcb8-f43a-4c8b-8912-43c7225c183d@github.com> References: <-0yo8KceENmJ48YPNoHCUkx_iEWpIE0mPJn_-BkjbWY=.76a8dcb8-f43a-4c8b-8912-43c7225c183d@github.com> Message-ID: On Fri, 7 Oct 2022 12:51:26 GMT, Alan Bateman wrote: >> Michael Ernst has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Reinstate typos in Apache code that is copied into the JDK >> - Merge ../jdk-openjdk into typos-typos >> - Remove file that was removed upstream >> - Fix inconsistency in capitalization >> - Undo change in zlip >> - Fix typos > > src/java.se/share/data/jdwp/jdwp.spec line 101: > >> 99: "platform thread " >> 100: "in the target VM. This includes platform threads created with the Thread " >> 101: "API and all native threads attached to the target VM with JNI code." > > The spec for the JDWP AllThreads command was significantly reworded in Java 19 so this is where this typo crept in. We have JDK-8294672 tracking it to fix for Java 20, maybe you should take it? Since this PR has gone stale, I'll be fixing this typo in jdwp.spec via [JDK-8294672](https://bugs.openjdk.org/browse/JDK-8294672). ------------- PR: https://git.openjdk.org/jdk/pull/10029 From sspitsyn at openjdk.org Fri Oct 28 01:41:00 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Oct 2022 01:41:00 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 13:29:45 GMT, Coleen Phillimore wrote: > I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. > > Tested with tier1-4. Nice refactoring and simplification. It looks good to me. Thanks, Serguei src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 4408: > 4406: if (!the_class->has_been_redefined()) { > 4407: the_class->set_has_been_redefined(); > 4408: } Nit: Is this change really needed? ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10249 From dholmes at openjdk.org Fri Oct 28 01:49:31 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 28 Oct 2022 01:49:31 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v7] In-Reply-To: References: Message-ID: <6KaO6YDJAQZSps49h6TddX8-aXFEfOFCfLgpi1_90Ag=.d7fe0ac9-d392-4784-a13e-85f5212e00f1@github.com> On Thu, 27 Oct 2022 20:38:57 GMT, John R Rose wrote: > So the data structure for lock records (per thread) could consist of a series of distinct values [ A B C ] and each of the values could be repeated, but only adjacently: [ A A A B C C ] for example. @rose00 why only adjacently? Nested locking can be interleaved on different monitors. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From dholmes at openjdk.org Fri Oct 28 04:07:24 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 28 Oct 2022 04:07:24 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 13:29:45 GMT, Coleen Phillimore wrote: > I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. > > Tested with tier1-4. Refactoring looks good. These are all the immutable flags right? (ie set once during class creation, and no concurrency issues). One query on SA below. Thanks. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 129: > 127: > 128: MISC_REWRITTEN = db.lookupIntConstant("InstanceKlass::_misc_rewritten").intValue(); > 129: MISC_HAS_NONSTATIC_FIELDS = db.lookupIntConstant("InstanceKlass::_misc_has_nonstatic_fields").intValue(); These all seem to have been deleted rather than modified ?? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/10249 From dholmes at openjdk.org Fri Oct 28 04:07:25 2022 From: dholmes at openjdk.org (David Holmes) Date: Fri, 28 Oct 2022 04:07:25 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: On Fri, 28 Oct 2022 01:36:56 GMT, Serguei Spitsyn wrote: >> I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. >> >> Tested with tier1-4. > > src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 4408: > >> 4406: if (!the_class->has_been_redefined()) { >> 4407: the_class->set_has_been_redefined(); >> 4408: } > > Nit: Is this change really needed? Seems unrelated to this refactoring. If really a bug it should be fixed separately. ------------- PR: https://git.openjdk.org/jdk/pull/10249 From rehn at openjdk.org Fri Oct 28 06:35:07 2022 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 28 Oct 2022 06:35:07 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking In-Reply-To: <231161996.35475533.1666904633911.JavaMail.zimbra@u-pem.fr> References: <231161996.35475533.1666904633911.JavaMail.zimbra@u-pem.fr> Message-ID: On Fri, 28 Oct 2022 03:32:58 GMT, Remi Forax wrote: > i've some trouble to see how it can be implemented given that because of lock coarsening (+ may be OSR), the number of time a lock is held is different between the interpreted code and the compiled code. Correct me if I'm wrong, only C2 eliminates locks and C2 only compile if there is proper structured locking. This should mean that when we restore the eliminated locks in deopt we can inflate the recursive locks which are no longer interleaved and restructure the lock-stack accordingly. Is there another situation than deopt where it would matter? ------------- PR: https://git.openjdk.org/jdk/pull/10590 From lucy at openjdk.org Fri Oct 28 08:14:36 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 28 Oct 2022 08:14:36 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 23:18:46 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 169: >> >>> 167: // Load class loader data to determine whether the method's holder is concurrently unloading. >>> 168: __ load_method_holder(Z_R0_scratch, Z_method); >>> 169: __ z_lg(Z_R0_scratch, in_bytes(InstanceKlass::class_loader_data_offset()), Z_R0_scratch); >> >> I don't think R0 can be used for storage addressing. > > Those cases where 'if you use R0 it really means 0 and not the contents of the register' are a real gotcha. It's very simple, basically. R0 is good for everything, except for address calculations which are performed as part of an instruction execution, in which case R0 indicates "not specified". Take LA Z_R0,0(Z_R0,Z_R0) as an example. The leftmost use of Z_R0 is perfectly fine. It just serves result register. The middle (index) and right (base address) occurrences, however, are inputs for the triadic add to form the address. As said above, index and base register are "not specified" in the example and a "0" is fed into the adder. You will get used to this pretty quick. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From rkennke at openjdk.org Fri Oct 28 09:22:27 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 09:22:27 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Fri, 28 Oct 2022 00:16:34 GMT, Serguei Spitsyn wrote: > This is nice simplification. It looks pretty good. To be more safe, I'd suggest to run tier5 as well. Thanks, Serguei AFAIK, there is no tier5 (in OpenJDK). If you have something internal, then please give it a spin? Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10849 From rkennke at openjdk.org Fri Oct 28 09:32:58 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 09:32:58 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v8] In-Reply-To: References: Message-ID: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) > - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge remote-tracking branch 'upstream/master' into fast-locking - Merge remote-tracking branch 'upstream/master' into fast-locking - Merge remote-tracking branch 'upstream/master' into fast-locking - More RISC-V fixes - Merge remote-tracking branch 'origin/fast-locking' into fast-locking - RISC-V port - Revert "Re-use r0 in call to unlock_object()" This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. - Merge remote-tracking branch 'origin/fast-locking' into fast-locking - Fix number of rt args to complete_monitor_locking_C, remove some comments - Re-use r0 in call to unlock_object() - ... and 27 more: https://git.openjdk.org/jdk/compare/4b89fce0...3f0acba4 ------------- Changes: https://git.openjdk.org/jdk/pull/10590/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10590&range=07 Stats: 4031 lines in 137 files changed: 731 ins; 2703 del; 597 mod Patch: https://git.openjdk.org/jdk/pull/10590.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10590/head:pull/10590 PR: https://git.openjdk.org/jdk/pull/10590 From mdoerr at openjdk.org Fri Oct 28 10:50:33 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Oct 2022 10:50:33 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 22:28:16 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 178: >> >>> 176: // Class loader is weak. Determine whether the holder is still alive. >>> 177: __ z_lg(Z_R1_scratch, in_bytes(ClassLoaderData::holder_offset()), Z_R0_scratch); >>> 178: __ resolve_weak_handle(Address(Z_R1_scratch), Z_R1_scratch, Z_R0_scratch, Z_R2); >> >> You're killing R2 which contains the 1st argument. > > I also thought this might draw some attention. Is there a better register to use, or will I simply have to save & restore the value of whichever register I choose? Using R0 and R1 implicitly is ok, because they are scratch regs. Other regs should get passed by argument. There's already a tmp reg where the barrier is inserted which you can simply pass. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Fri Oct 28 11:09:30 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Oct 2022 11:09:30 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v13] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 22:26:22 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 243: >> >>> 241: __ save_return_pc(); >>> 242: __ push_frame_abi160(nbytes_save); // Will use Z_R0 as tmp. >>> 243: __ save_volatile_regs(Z_SP, frame::z_abi_160_size, true, true); >> >> Adds overhead for other usages which don't need it. May be still ok, though. > > Agreed. The PPC implementation I looked at has more complicated logic for determining if a frame is needed, but it relies on a PreservationLevel flag which is not present on s390. If there is another way to deduce whether a frame is required here, or if we should add that flag on s390, I am happy to make that change as well. PPC uses fine-grained control over what needs to get preserved. I think this would be good for s390, too. Be aware that changing this requires modification of the whole GC interface and all GCs. I'd split the work into 2 JBS issues and PRs. Maybe use one for nmethod entry barriers only and one for c2i entry barriers with modified GC interface? I still believe that c2i entry barriers are currently not needed for s390 to work correctly and are hence less urgent. They are needed for concurrent class unloading which only comes with ZGC and ShenandoahGC which are unavailable on s390. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From dcubed at openjdk.org Fri Oct 28 14:28:30 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 28 Oct 2022 14:28:30 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Thu, 27 Oct 2022 10:58:12 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix has_owner() condition Mach5 testing for the v00 version of this fix: Mach5 Tier1: - 1 known, unrelated test falure: - no task failures Mach5 Tier2: - no test failures - 2 known, unrelated slowdebug build task failures: Mach5 Tier3: - no test or task failures Mach5 Tier4: - 1 known, unrelated test failure: - no task failures Mach5 Tier5: - 1 known, unrelated test failure - no task failures Mach5 Tier6: - no test or task failures Mach5 Tier7: - no test or task failures Mach5 Tier8: - no test or task failures ------------- PR: https://git.openjdk.org/jdk/pull/10849 From lucy at openjdk.org Fri Oct 28 14:38:53 2022 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 28 Oct 2022 14:38:53 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v14] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 23:46:43 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Changes > > - Adds labels to registers used in c2i_entry_barrier > - Removes use of R2 > - Removes erroneous uses of R0 > - Adds nm->c2i_entry_barrier in gen_i2c2i_adapters Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 170: > 168: // Fast path: If no method is given, the call is definitely bad. > 169: __ z_cfi(Z_method, 0); > 170: __ z_bre(bad_call); Why don't you use __ z_ltgr(Z_method, Z_method); __ z_brz(bad_call); The use of CFI is incorrect anyway. It compares the low-order 32 bits only, but Z_method is a 64-bit address. CGFI would be correct. And CGHI would be correct and save two instruction bytes. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 179: > 177: __ z_llgf(Rtmp2, in_bytes(ClassLoaderData::keep_alive_offset()), Rtmp1); > 178: __ z_cfi(Rtmp2, 0); > 179: __ z_brne(skip_barrier); If I get it correctly, the value (4 bytes) is only loaded into Rtmp2 to test it for zero. Why don't you use __ z_ltr(Rtmp2, Rtmp2); __ z_brnz(skip_barrier); or, even better __ z_lt(Rtmp2, in_bytes(ClassLoaderData::keep_alive_offset()), Rtmp1); __ z_brnz(skip_barrier); src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 185: > 183: __ resolve_weak_handle(Address(Rtmp2), Rtmp2, Rtmp1, Rtmp3); > 184: __ z_cfi(Rtmp2, 0); > 185: __ z_brne(skip_barrier); Why don't you use __ z_ltr(Rtmp2, Rtmp2); __ z_brnz(skip_barrier); if the value in Rtmp2 is 4 significant bytes only, LTGR otherwise. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Fri Oct 28 14:53:27 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 28 Oct 2022 14:53:27 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v14] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 23:46:43 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Changes > > - Adds labels to registers used in c2i_entry_barrier > - Removes use of R2 > - Removes erroneous uses of R0 > - Adds nm->c2i_entry_barrier in gen_i2c2i_adapters src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 164: > 162: Rtmp1 = Z_R1_scratch; > 163: Rtmp2 = Z_R7; > 164: Rtmp3 = Z_R8; Registers other than R0 and R1 should not be hardcoded, here. Please pass them by argument and manage them one level above. We should avoid distributing register selection for maintainablily reasons. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From dcubed at openjdk.org Fri Oct 28 15:17:31 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 28 Oct 2022 15:17:31 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Fri, 28 Oct 2022 09:18:28 GMT, Roman Kennke wrote: >> This is nice simplification. >> It looks pretty good. >> To be more safe, I'd suggest to run tier5 as well. >> Thanks, >> Serguei > >> This is nice simplification. It looks pretty good. To be more safe, I'd suggest to run tier5 as well. Thanks, Serguei > > AFAIK, there is no tier5 (in OpenJDK). If you have something internal, then please give it a spin? > > Thanks, > Roman @rkennke - can you merge the latest jdk bits into this PR? After that I'll do more Mach5 testing for you... ------------- PR: https://git.openjdk.org/jdk/pull/10849 From rkennke at openjdk.org Fri Oct 28 15:28:42 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 15:28:42 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v3] In-Reply-To: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> > There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: > - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. > - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. > > Testing: > - [x] GHA (x86 and x-compile failures look like infra glitch) > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8295849 - Fix has_owner() condition - Improve condition in OM::has_owner() - Fix OM::has_owner() - 8295849: Consolidate Threads::owning_thread* ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10849/files - new: https://git.openjdk.org/jdk/pull/10849/files/37fa31bf..55365b30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10849&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10849&range=01-02 Stats: 27543 lines in 440 files changed: 2914 ins; 23896 del; 733 mod Patch: https://git.openjdk.org/jdk/pull/10849.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10849/head:pull/10849 PR: https://git.openjdk.org/jdk/pull/10849 From rkennke at openjdk.org Fri Oct 28 15:28:42 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 15:28:42 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Fri, 28 Oct 2022 09:18:28 GMT, Roman Kennke wrote: >> This is nice simplification. >> It looks pretty good. >> To be more safe, I'd suggest to run tier5 as well. >> Thanks, >> Serguei > >> This is nice simplification. It looks pretty good. To be more safe, I'd suggest to run tier5 as well. Thanks, Serguei > > AFAIK, there is no tier5 (in OpenJDK). If you have something internal, then please give it a spin? > > Thanks, > Roman > @rkennke - can you merge the latest jdk bits into this PR? After that I'll do more Mach5 testing for you... Thanks for the testing! I've now merged the latest jdk changes into this PR. ------------- PR: https://git.openjdk.org/jdk/pull/10849 From rkennke at openjdk.org Fri Oct 28 15:29:39 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 15:29:39 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v8] In-Reply-To: References: Message-ID: <7ORZSjVcOQ8IrMAC0iS2pgsf_-vMKZQVmfjxAROqVq4=.267878cb-6392-428c-8a11-b431b2e19cfb@github.com> On Fri, 28 Oct 2022 09:32:58 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) >> - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge remote-tracking branch 'upstream/master' into fast-locking > - Merge remote-tracking branch 'upstream/master' into fast-locking > - Merge remote-tracking branch 'upstream/master' into fast-locking > - More RISC-V fixes > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - RISC-V port > - Revert "Re-use r0 in call to unlock_object()" > > This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Fix number of rt args to complete_monitor_locking_C, remove some comments > - Re-use r0 in call to unlock_object() > - ... and 27 more: https://git.openjdk.org/jdk/compare/4b89fce0...3f0acba4 FYI: I am working on an alternative PR for this that makes fast-locking optional and opt-in behind an experimental switch. It will also be much less invasive (no structural changes except absolutely necessary, no cleanups) and thus easier to handle. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From matsaave at openjdk.org Fri Oct 28 15:51:34 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 28 Oct 2022 15:51:34 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v4] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: - Improved constant pool printing - Added helper functions for constant pool ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/452ef598..52d131ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=02-03 Stats: 94 lines in 3 files changed: 67 ins; 15 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From sspitsyn at openjdk.org Fri Oct 28 18:10:27 2022 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 28 Oct 2022 18:10:27 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v3] In-Reply-To: <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> Message-ID: On Fri, 28 Oct 2022 15:28:42 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8295849 > - Fix has_owner() condition > - Improve condition in OM::has_owner() > - Fix OM::has_owner() > - 8295849: Consolidate Threads::owning_thread* Dan, thank you for running tests! ------------- PR: https://git.openjdk.org/jdk/pull/10849 From matsaave at openjdk.org Fri Oct 28 19:18:27 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 28 Oct 2022 19:18:27 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v5] In-Reply-To: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: > As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. > > The text format and contents are tentative, please review. > > Here is an example output when using `findmethod()`: > > "Executing findmethod" > flags (bitmask): > 0x01 - print names of methods > 0x02 - print bytecodes > 0x04 - print the address of bytecodes > 0x08 - print info for invokedynamic > 0x10 - print info for invokehandle > > [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} > 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V > 0 iconst_0 > 1 istore_1 > 2 iload_1 > 3 iconst_2 > 4 if_icmpge 24 > 7 getstatic 7 > 10 invokedynamic bsm=31 13 > BSM: REF_invokeStatic 32 > arguments[1] = { > 000 > } > ConstantPoolCacheEntry: 4 > - this: 0x00007fffa0400570 > - bytecode 1: invokedynamic ba > - bytecode 2: nop 00 > - cp index: 13 > - F1: [ 0x00000008000c8658] > - F2: [ 0x0000000000000003] > - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) > - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] > - tos: object > - local signature: 1 > - has appendix: 1 > - forced virtual: 0 > - final: 1 > - virtual Final: 0 > - resolution Failed: 0 > - num Parameters: 02 > Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > appendix: java.lang.invoke.BoundMethodHandle$Species_LL > {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' > - ---- fields (total size 5 words): > - private 'customizationCount' 'B' @12 0 (0x00) > - private volatile 'updateInProgress' 'Z' @13 false (0x00) > - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) > - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) > - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) > - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) > - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) > - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) > ------------- > 15 putstatic 17 > 18 iinc #1 1 > 21 goto 2 > 24 return Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Added resolution information for fields ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10860/files - new: https://git.openjdk.org/jdk/pull/10860/files/52d131ad..83a6cced Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10860&range=03-04 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10860/head:pull/10860 PR: https://git.openjdk.org/jdk/pull/10860 From duke at openjdk.org Fri Oct 28 19:52:03 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 19:52:03 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Thu, 27 Oct 2022 05:10:59 GMT, Jatin Bhateja wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > >> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead >> 174: // and not affect platforms without intrinsic support >> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; > > Since Poly processes 16 byte chunks, a strength reduced version of above expression could be len & (~(BLOCK_LEN-1) I guess I got no issue with either version.. I was mostly thinking about code clarity? I think your version is 'more reliable' so just gonna switch it, thanks. > test/micro/org/openjdk/bench/javax/crypto/full/Poly1305DigestBench.java line 94: > >> 92: throw new RuntimeException(ex); >> 93: } >> 94: } > > On CLX patch shows performance regression of about 10% for block size 1024-2048+. > > CLX (Non-IFMA target) > > Baseline (JDK-20):- > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 2 3128928.978 ops/s > Poly1305DigestBench.digest 256 thrpt 2 1526452.083 ops/s > Poly1305DigestBench.digest 1024 thrpt 2 509267.401 ops/s > Poly1305DigestBench.digest 2048 thrpt 2 305784.922 ops/s > Poly1305DigestBench.digest 4096 thrpt 2 142175.885 ops/s > Poly1305DigestBench.digest 8192 thrpt 2 72142.906 ops/s > Poly1305DigestBench.digest 16384 thrpt 2 36357.000 ops/s > Poly1305DigestBench.digest 1048576 thrpt 2 676.142 ops/s > > > Withopt: > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 2 3136204.416 ops/s > Poly1305DigestBench.digest 256 thrpt 2 1683221.124 ops/s > Poly1305DigestBench.digest 1024 thrpt 2 457432.172 ops/s > Poly1305DigestBench.digest 2048 thrpt 2 277563.817 ops/s > Poly1305DigestBench.digest 4096 thrpt 2 149393.357 ops/s > Poly1305DigestBench.digest 8192 thrpt 2 79463.734 ops/s > Poly1305DigestBench.digest 16384 thrpt 2 41083.730 ops/s > Poly1305DigestBench.digest 1048576 thrpt 2 705.419 ops/s Odd, I measured it on `11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz`, will go again ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 28 20:23:41 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 20:23:41 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Thu, 27 Oct 2022 09:33:32 GMT, Jatin Bhateja wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 849: > >> 847: jcc(Assembler::less, L_process16Loop); >> 848: >> 849: poly1305_process_blocks_avx512(input, length, > > Since entire code is based on 512 bit encoding misalignment penalty may be costly here. A scalar peel handling (as done in tail) for input portion before a 64 byte aligned address could further improve the performance for large block sizes. Hmm.. interesting. Is this for loading? `evmovdquq` vs `evmovdqaq`? I was actually looking at using evmovdqaq but there is no encoding for it yet (And just looking now on uops.info, they seem to have identical timings? perhaps their measurements are off..). There are quite a few optimizations I tried (and removed) here, but not this one.. Perhaps to have a record, while its relatively fresh in my mind.. since there is a 8-block (I deleted a 16-block vector multiply), one can have a peeled off version for just 256 as the minimum payload.. In that case we only need R^1..R^8, (not R^1..R^16). I also tried loop stride of 8 blocks instead of 16, but that gets quite bit slower (20ish%?).. There was also a version that did a much better interleaving of multiplication and loading of next message block into limbs.. There is potentially a better way to 'devolve' the vector loop at tail; ie. when 15-blocks are left, just do one more 8-block multiply, all the constants are already available.. I removed all of those eventually. Even then, the assembler code currently is already fairly complex. The extra pre-, post-processing and if cases, I was struggling to keep up myself. Maybe code cleanup would have helped, so it _is_ possible to bring some of that back in for extra 10+%? (There is a branch on my fork with that code) I guess that's my long way of saying 'I don't want to complicate the assembler loop'? ------------- PR: https://git.openjdk.org/jdk/pull/10582 From rkennke at openjdk.org Fri Oct 28 20:24:38 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 20:24:38 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v2] In-Reply-To: References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Fri, 28 Oct 2022 15:15:15 GMT, Daniel D. Daugherty wrote: >>> This is nice simplification. It looks pretty good. To be more safe, I'd suggest to run tier5 as well. Thanks, Serguei >> >> AFAIK, there is no tier5 (in OpenJDK). If you have something internal, then please give it a spin? >> >> Thanks, >> Roman > > @rkennke - can you merge the latest jdk bits into this PR? After that I'll > do more Mach5 testing for you... Thanks, @dcubed-ojdk and @sspitsyn! ------------- PR: https://git.openjdk.org/jdk/pull/10849 From rkennke at openjdk.org Fri Oct 28 20:26:24 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 28 Oct 2022 20:26:24 GMT Subject: Integrated: 8295849: Consolidate Threads::owning_thread* In-Reply-To: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> Message-ID: On Tue, 25 Oct 2022 11:39:37 GMT, Roman Kennke wrote: > There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: > - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. > - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. > > Testing: > - [x] GHA (x86 and x-compile failures look like infra glitch) > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 This pull request has now been integrated. Changeset: a44ebd5f Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/a44ebd5fbc164ccdd2cc9a64739776ebaa0a8011 Stats: 68 lines in 7 files changed: 18 ins; 36 del; 14 mod 8295849: Consolidate Threads::owning_thread* Reviewed-by: dcubed, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/10849 From dcubed at openjdk.org Fri Oct 28 20:30:13 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 28 Oct 2022 20:30:13 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v3] In-Reply-To: <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> Message-ID: On Fri, 28 Oct 2022 15:28:42 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8295849 > - Fix has_owner() condition > - Improve condition in OM::has_owner() > - Fix OM::has_owner() > - 8295849: Consolidate Threads::owning_thread* I wasn't expecting this PR to integrate until after I posted the latest test results... ------------- PR: https://git.openjdk.org/jdk/pull/10849 From duke at openjdk.org Fri Oct 28 20:39:44 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 20:39:44 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6] In-Reply-To: References: Message-ID: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and java. > - Would like to add an `InvalidKeyException` in `Poly1305.java` (see commented out block in that file), but that conflicts with the KAT. I do think we should detect (R==0 || S ==0) so would like advice please. > - Added a JMH perf test. > - JMH test had to use reflection (instead of existing `MacBench.java`), since Poly1305 is not 'properly' registered with the provider. > > Perf before: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2961300.661 ? 110554.162 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1791912.962 ? 86696.037 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 637413.054 ? 14074.655 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 48762.991 ? 390.921 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 769.872 ? 1.402 ops/s > > and after: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.digest 64 thrpt 8 2841243.668 ? 154528.057 ops/s > Poly1305DigestBench.digest 256 thrpt 8 1662003.873 ? 95253.445 ops/s > Poly1305DigestBench.digest 1024 thrpt 8 1770028.718 ? 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt 8 765547.287 ? 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt 8 14508.458 ? 56.147 ops/s vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: invalidkeyexception and some review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10582/files - new: https://git.openjdk.org/jdk/pull/10582/files/883be106..78fd8fd7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10582&range=04-05 Stats: 33 lines in 7 files changed: 5 ins; 1 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/10582.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10582/head:pull/10582 PR: https://git.openjdk.org/jdk/pull/10582 From redestad at openjdk.org Fri Oct 28 20:48:10 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 28 Oct 2022 20:48:10 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops Message-ID: Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. With the most recent fixes the x64 intrinsic results on my workstation look like this: Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op Baseline: Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op I.e. no measurable overhead compared to baseline even for `size == 1`. The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. Benchmark for `Arrays.hashCode`: Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op Baseline: Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. ------------- Commit messages: - ws - Add ArraysHashCode microbenchmarks - Fixed vector loops for int and char arrays - Split up Arrays/HashCode tests - Fixes, optimized short inputs, temporarily disabled vector loop for Arrays.hashCode cases, added and improved tests - typo - Add Arrays.hashCode tests, enable intrinsic by default on x86 - Correct start values for array hashCode methods - Merge branch 'master' into 8282664-polyhash - Fold identical ops; only add coef expansion for Arrays cases - ... and 28 more: https://git.openjdk.org/jdk/compare/303548ba...22fec5f0 Changes: https://git.openjdk.org/jdk/pull/10847/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8282664 Stats: 1129 lines in 32 files changed: 1071 ins; 32 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From luhenry at openjdk.org Fri Oct 28 20:48:10 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 28 Oct 2022 20:48:10 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 10:37:40 GMT, Claes Redestad wrote: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. I did a quick write up explaining the approach at https://gist.github.com/luhenry/2fc408be6f906ef79aaf4115525b9d0c. Also, you can find details in @richardstartin's [blog post](https://richardstartin.github.io/posts/vectorised-polynomial-hash-codes) ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Fri Oct 28 20:48:12 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 28 Oct 2022 20:48:12 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: <2URA7qiWBkx-l9U0FfNIBNOVyDeToiv8x0fmhHKhGOs=.edad5b57-0986-41ca-83f1-256021f5ec11@github.com> On Tue, 25 Oct 2022 10:37:40 GMT, Claes Redestad wrote: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. While there are some incompleteness (no vectorization of byte and short arrays) I think this is ready to begin reviewing now. Implementing vectorization properly for byte and short arrays can be done as a follow-up, or someone might now a way to sign-extend subword integers properly that fits easily into the intrinsic implementation here. Porting to aarch64 and other platforms can be done as follow-ups and shouldn't block integration. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From duke at openjdk.org Fri Oct 28 21:06:18 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 21:06:18 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Mon, 24 Oct 2022 23:38:16 GMT, Sandhya Viswanathan wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/hotspot/cpu/x86/assembler_x86.cpp line 8306: > >> 8304: assert(dst != xnoreg, "sanity"); >> 8305: InstructionMark im(this); >> 8306: InstructionAttr attributes(vector_len, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true); > > no_mask_reg should be set to true here as we are not setting the mask register here. done > src/hotspot/cpu/x86/stubRoutines_x86.cpp line 83: > >> 81: address StubRoutines::x86::_join_2_3_base64 = NULL; >> 82: address StubRoutines::x86::_decoding_table_base64 = NULL; >> 83: address StubRoutines::x86::_poly1305_mask_addr = NULL; > > Please also update the copyright year to 2022 for stubRoutines_x86.cpp and hpp files. done. (hpp seemed ok) > src/hotspot/cpu/x86/vm_version_x86.cpp line 925: > >> 923: _features &= ~CPU_AVX512_VBMI2; >> 924: _features &= ~CPU_AVX512_BITALG; >> 925: _features &= ~CPU_AVX512_IFMA; > > This should also be done under is_knights_family(). done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 28 21:06:19 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 21:06:19 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: <4FY4SEodgFcdxFXvGWFJWHYCr1GD4nAktLa5SiyPcxM=.384b2818-b6c5-4523-8682-5b730d9ad036@github.com> References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> <4FY4SEodgFcdxFXvGWFJWHYCr1GD4nAktLa5SiyPcxM=.384b2818-b6c5-4523-8682-5b730d9ad036@github.com> Message-ID: On Wed, 26 Oct 2022 15:47:28 GMT, vpaprotsk wrote: >> src/hotspot/cpu/x86/macroAssembler_x86_poly.cpp line 806: >> >>> 804: evmovdquq(A0, Address(rsp, 64*0), Assembler::AVX_512bit); >>> 805: evmovdquq(A0, Address(rsp, 64*1), Assembler::AVX_512bit); >>> 806: evmovdquq(A0, Address(rsp, 64*2), Assembler::AVX_512bit); >> >> This is load from stack into A0. Did you intend to store A0 (cleanup) into stack local area here? I think the source and destination are mixed up here. > > Wow! Thank you for spotting this done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 28 21:06:21 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 21:06:21 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Thu, 27 Oct 2022 09:29:52 GMT, Jatin Bhateja wrote: >> vpaprotsk has updated the pull request incrementally with one additional commit since the last revision: >> >> extra whitespace character > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2040: > >> 2038: >> 2039: address StubGenerator::generate_poly1305_processBlocks() { >> 2040: __ align64(); > > This can be replaced by __ align(CodeEntryAlignment); done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 28 21:06:21 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 21:06:21 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: <4AB7TAZwydDonBwfxasMLmgVIQuaLgMUxck7eCbzYxw=.a9062602-90d4-4bde-baff-629bea466527@github.com> On Thu, 27 Oct 2022 21:19:06 GMT, Jamil Nimeh wrote: >>> 10% is not a negligible impact. I see your point about AVX512 reaping the rewards of this change, but there are plenty of x86_64 systems without AVX512 that will be impacted, not to mention other platforms like aarch64 which (for this change at least) will never see the benefits from the intrinsic. >>> >>> I don't have any suggestions right at this moment for how this could be streamlined at all to help reduce the pain for non-AVX512 systems. Worth looking into though. >> >> Do you suggest using white box APIs for CPU feature query during poly static initialization and perform multi block processing only for relevant platforms and keep the original implementation sacrosanct for other targets. VM does offer native white box primitives and currently its being used by tests infrastructure. > > No, going the WhiteBox route was not something I was thinking of. I sought feedback from a couple hotspot-knowledgable people about the use of WhiteBox APIs and both felt that it was not the right way to go. One said that WhiteBox is really for VM testing and not for these kinds of java classes. One idea I was trying to measure was to make the intrinsic (i.e. the while loop remains exactly the same, just moved to different =non-static= function): private void processMultipleBlocks(byte[] input, int offset, int length) { //, MutableIntegerModuloP A, IntegerModuloP R) { while (length >= BLOCK_LENGTH) { n.setValue(input, offset, BLOCK_LENGTH, (byte)0x01); a.setSum(n); // A += (temp | 0x01) a.setProduct(r); // A = (A * R) % p offset += BLOCK_LENGTH; length -= BLOCK_LENGTH; } } In principle, the java version would not get any slower (i.e. there is only one extra function jump). At the expense of the C++ glue getting more complex. In C++ I need to dig out using IR `(sun.security.util.math.intpoly.IntegerPolynomial.MutableElement)(this.a).limbs` then convert 5*26bit limbs into 3*44-bit limbs. The IR is very new to me so will take some time. (I think I found some AES code that does something similar). That said.. I thought this idea would had been perhaps a separate PR, if needed at all.. Digging limbs out is one thing, but also need to add asserts and safety. Mostly would be happy to just measure if its worth it. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 28 21:06:23 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 21:06:23 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Fri, 28 Oct 2022 19:46:33 GMT, vpaprotsk wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: >> >>> 173: // Choice of 1024 is arbitrary, need enough data blocks to amortize conversion overhead >>> 174: // and not affect platforms without intrinsic support >>> 175: int blockMultipleLength = (len/BLOCK_LENGTH) * BLOCK_LENGTH; >> >> Since Poly processes 16 byte chunks, a strength reduced version of above expression could be len & (~(BLOCK_LEN-1) > > I guess I got no issue with either version.. I was mostly thinking about code clarity? I think your version is 'more reliable' so just gonna switch it, thanks. done ------------- PR: https://git.openjdk.org/jdk/pull/10582 From duke at openjdk.org Fri Oct 28 21:06:26 2022 From: duke at openjdk.org (vpaprotsk) Date: Fri, 28 Oct 2022 21:06:26 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: On Wed, 26 Oct 2022 15:27:55 GMT, vpaprotsk wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 296: >> >>> 294: keyBytes[12] &= (byte)252; >>> 295: >>> 296: // This should be enabled, but Poly1305KAT would fail >> >> I'm on the fence about this change. I have no problem with it in basic terms. If we ever decided to make this a general purpose Mac in JCE then this would definitely be good to do. As of right now, the only consumer is ChaCha20 and it would submit a key through the process in the RFC. Seems really unlikely to run afoul of these checks, but admittedly not impossible. >> >> I would agree with @sviswa7 that we could examine this in a separate change and we could look at other approaches to getting around the KAT issue, perhaps some package-private based way to disable the check. As long as Poly1305 remains with package-private visibility, one could make another form of the constructor with a boolean that would disable this check and that is the constructor that the KAT would use. This is just an off-the-cuff idea, but one way we might get the best of both worlds. >> >> If we move this down the road then we should remove the commenting. We can refer back to this PR later. > > I think I will remove the check for now, dont want to hold up reviews. I wasn't sure how to 'inject a backdoor' to the commented out check either, or at least how to do it in an acceptable way. Your ideas do sound plausible, and if anyone does want this check, I can implement one of the ideas (package private boolean flag? turn it on in the test) while waiting for more reviews to come in. > > The comment about ChaCha being the only way in is also relevant, thanks. i.e. this is a private class today. I flipped-flopped on this.. I already had the code for the exception.. and already described the potential fix. So rather then remove the code, pushed the described fix. Its always easier to remove the extra field I added. Let me know what you think about the 'backdoor' field. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From tsteele at openjdk.org Fri Oct 28 21:13:28 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 28 Oct 2022 21:13:28 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v14] In-Reply-To: References: Message-ID: On Thu, 27 Oct 2022 23:46:43 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. Apologies for the force-push and associated chaos. I believe the best course of action is to merge this PR as it was before the suggestions by fisk (which are appreciated nonetheless), as that PR solves the more immediate problem of fixing the build on s390. I have incorporated fisk's suggestions in [a separate PR](https://github.com/openjdk/jdk/pull/10909), where I propose we continue development. In the new PR, I have integrated Martin & Lutz' improvements to my s390 assembly given in the reviews above. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From jnimeh at openjdk.org Fri Oct 28 21:58:30 2022 From: jnimeh at openjdk.org (Jamil Nimeh) Date: Fri, 28 Oct 2022 21:58:30 GMT Subject: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5] In-Reply-To: References: <9h52z_DWFvTWWwasN7vzl9-7C0-Tj50Cis4fgRNuId8=.65de1f73-f5f3-4326-b9e0-6211861452ea@github.com> Message-ID: <0xJMPRdK0h3UJBYxqeLMfp1baL8xoaUpNcAZOtrFLKo=.d5c1020e-9e61-4800-bb52-9adbdd17e19f@github.com> On Fri, 28 Oct 2022 21:03:32 GMT, vpaprotsk wrote: >> I think I will remove the check for now, dont want to hold up reviews. I wasn't sure how to 'inject a backdoor' to the commented out check either, or at least how to do it in an acceptable way. Your ideas do sound plausible, and if anyone does want this check, I can implement one of the ideas (package private boolean flag? turn it on in the test) while waiting for more reviews to come in. >> >> The comment about ChaCha being the only way in is also relevant, thanks. i.e. this is a private class today. > > I flipped-flopped on this.. I already had the code for the exception.. and already described the potential fix. So rather then remove the code, pushed the described fix. Its always easier to remove the extra field I added. Let me know what you think about the 'backdoor' field. Well, what you're doing achieves what we're looking for, thanks for making that change. I think I'd like to see that value set on construction and not be mutable from outside the object. Something like this: - place a `private final boolean checkWeakKey` up near where all the other fields are defined. - the no-args Poly1305 is implemented as `this(true)` - an additional constructor is created `Poly1305(boolean checkKey)` which sets `checkWeakKey` true or false as provided by the parameter. - in setRSVals you should be able to wrap lines 296-310 inside a single `if (checkWeakKey)` block. - In the Poly1305KAT the `new Poly1305()` becomes `new Poly1305(false)`. ------------- PR: https://git.openjdk.org/jdk/pull/10582 From dcubed at openjdk.org Sat Oct 29 02:41:32 2022 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 29 Oct 2022 02:41:32 GMT Subject: RFR: 8295849: Consolidate Threads::owning_thread* [v3] In-Reply-To: <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> References: <51ZjzMcue4BzL9R5Utyr198fkk2ns0PtCqqMKL3UiDw=.0792e8f3-50c1-4801-9097-c6775c121a0a@github.com> <6_2PU8XR4HUKXpssD2310mllx-yajMo6M3f8H6CvasY=.7d2c2123-918e-42d2-bc0b-80253a722fee@github.com> Message-ID: On Fri, 28 Oct 2022 15:28:42 GMT, Roman Kennke wrote: >> There are several users and even mostly-identical implementations of Threads::owning_thread_from_monitor_owner(), which I would like to consolidate a little in preparation of JDK-8291555: >> - JvmtiEnvBase::get_monitor_usage(): As the comment in ObjectSynchronizer::get_lock_owner() suggests, the JVMTI code should call the ObjectSynchronizer method. The only real difference is that JVMTI loads the object header directly while OS spins to avoid INFLATING. This is harmless, because JVMTI calls from safepoint, where INFLATING does not occur, and would just do a simple load of the header. A little care must be taken to fetch the monitor if exists a few lines below, to fill in monitor info. >> - Two ThreadService methods call Threads::owning_thread_from_monitor_owner(), but always only ever from a monitor. I would like to extract that special case because with fast-locking this can be treated differently (with fast-locking, monitor owners can only be JavaThread* or 'anonynmous'). It's also a little cleaner IMO. >> >> Testing: >> - [x] GHA (x86 and x-compile failures look like infra glitch) >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8295849 > - Fix has_owner() condition > - Improve condition in OM::has_owner() > - Fix OM::has_owner() > - 8295849: Consolidate Threads::owning_thread* Mach5 testing for the v02 version of this fix: Mach5 Tier1: - no test or task failures Mach5 Tier2: - no test or task failures Mach5 Tier3: - no test or task failures Mach5 Tier4: - 1 known, unrelated test failure: - no task failures Mach5 Tier5: - 1 test suite issue due to a corrupted download that resulted in 240 test failures - no tasks failed Mach5 Tier6: - no test failures and no task failures Mach5 Tier7: - 1 known, unrelated test failure - no task failures Mach5 Tier8: - skipped for the v02 version since the PR is already integrated ------------- PR: https://git.openjdk.org/jdk/pull/10849 From duke at openjdk.org Sat Oct 29 09:28:25 2022 From: duke at openjdk.org (Piotr Tarsa) Date: Sat, 29 Oct 2022 09:28:25 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: <2URA7qiWBkx-l9U0FfNIBNOVyDeToiv8x0fmhHKhGOs=.edad5b57-0986-41ca-83f1-256021f5ec11@github.com> References: <2URA7qiWBkx-l9U0FfNIBNOVyDeToiv8x0fmhHKhGOs=.edad5b57-0986-41ca-83f1-256021f5ec11@github.com> Message-ID: On Fri, 28 Oct 2022 20:43:04 GMT, Claes Redestad wrote: > Porting to aarch64 and other platforms can be done as follow-ups and shouldn't block integration. I'm not an expert in JVM internals, but there's an already seemingly working String.hashCode intrinsification that's ISA independent: https://github.com/openjdk/jdk/pull/6658 It operates on higher level than direct assembly instructions, i.e. it operates on the ISA-independent vector nodes, so that all hardware platforms that support vectorization would get speedup (i.e. x86-64, x86-32, arm32, arm64, etc), therefore reducing manual work to get all of them working. I wonder why that pull request got no visible interest? Forgive me if I got something wrong :) ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Sat Oct 29 10:38:08 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sat, 29 Oct 2022 10:38:08 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 10:37:40 GMT, Claes Redestad wrote: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Porting to aarch64 and other platforms can be done as follow-ups and shouldn't block integration. > > I'm not an expert in JVM internals, but there's an already seemingly working String.hashCode intrinsification that's ISA independent: #6658 It operates on higher level than direct assembly instructions, i.e. it operates on the ISA-independent vector nodes, so that all hardware platforms that support vectorization would get speedup (i.e. x86-64, x86-32, arm32, arm64, etc), therefore reducing manual work to get all of them working. I wonder why that pull request got no visible interest? > > Forgive me if I got something wrong :) I'll have to ask @merykitty why that patch was stalled. Never appeared on my radar until now -- thanks! The approach to use the library call kit API is promising since it avoids the need to port. And with similar results. I'll see if we can merge the approach here of having a shared intrinsic for `Arrays` and `String`, and bring in an ISA-independent backend implementation as in #6658 ------------- PR: https://git.openjdk.org/jdk/pull/10847 From qamai at openjdk.org Sat Oct 29 15:16:34 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 29 Oct 2022 15:16:34 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: On Tue, 25 Oct 2022 10:37:40 GMT, Claes Redestad wrote: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. I am planning to submit that patch after finishing with the current under-reviewed PRs. That patch was stalled because there was no node for vectorised unsigned cast and constant values. The first one has been added and the second one may be worked around as in the PR. I also thought of using masked loads for tail processing instead of falling back to scalar implementation. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Sun Oct 30 19:21:28 2022 From: redestad at openjdk.org (Claes Redestad) Date: Sun, 30 Oct 2022 19:21:28 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: On Sat, 29 Oct 2022 15:11:56 GMT, Quan Anh Mai wrote: > I am planning to submit that patch after finishing with the current under-reviewed PRs. That patch was stalled because there was no node for vectorised unsigned cast and constant values. The first one has been added and the second one may be worked around as in the PR. I also thought of using masked loads for tail processing instead of falling back to scalar implementation. Ok, then I think we might as well move forward with this enhancement first. It'd establish some new tests, microbenchmarks as well as unifying the polynomial hash loops into a single intrinsic endpoint - while also putting back something that would be straightforward to backport (less dependencies on other recent enhancements). Then once the vector IR nodes have matured we can easily rip out the `VectorizedHashCodeNode` and replace it with such an implementation. WDYT? ------------- PR: https://git.openjdk.org/jdk/pull/10847 From qamai at openjdk.org Mon Oct 31 02:49:24 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 31 Oct 2022 02:49:24 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: References: Message-ID: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> On Tue, 25 Oct 2022 10:37:40 GMT, Claes Redestad wrote: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. I agree, please go ahead, I leave some comments for the x86 implementation. Thanks. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3358: > 3356: movl(result, is_string_hashcode ? 0 : 1); > 3357: > 3358: // if (cnt1 == 0) { You may want to reorder the execution of the loops, a short array suffers more from processing than a big array, so you should have minimum extra hops for those. For example, I think this could be: if (cnt1 >= 4) { if (cnt1 >= 16) { UNROLLED VECTOR LOOP SINGLE VECTOR LOOP } UNROLLED SCALAR LOOP } SINGLE SCALAR LOOP The thresholds are arbitrary and need to be measured carefully. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3374: > 3372: > 3373: // int i = 0; > 3374: movl(index, 0); `xorl(index, index)` src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3387: > 3385: for (int idx = 0; idx < 4; idx++) { > 3386: // h = (31 * h) or (h << 5 - h); > 3387: movl(tmp, result); If you are unrolling this, maybe break the dependency chain, `h = h * 31**4 + x[i] * 31**3 + x[i + 1] * 31**2 + x[i + 2] * 31 + x[i + 3]` src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3418: > 3416: // } else { // cnt1 >= 32 > 3417: address power_of_31_backwards = pc(); > 3418: emit_int32( 2111290369); Can this giant table be shared among compilations instead? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3484: > 3482: decrementl(index); > 3483: jmpb(LONG_SCALAR_LOOP_BEGIN); > 3484: bind(LONG_SCALAR_LOOP_END); You can share this loop with the scalar ones above. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3493: > 3491: // vnext = IntVector.broadcast(I256, power_of_31_backwards[0]); > 3492: movdl(vnext, InternalAddress(power_of_31_backwards + (0 * sizeof(jint)))); > 3493: vpbroadcastd(vnext, vnext, Assembler::AVX_256bit); `vpbroadcastd` can take an `Address` argument instead. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3523: > 3521: subl(index, 32); > 3522: // i >= 0; > 3523: cmpl(index, 0); You don't need this since `subl` sets flags according to the result. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528: > 3526: vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit); > 3527: } > 3528: jmp(LONG_VECTOR_LOOP_BEGIN); Calculating backward forces you to do calculating the coefficients on each iteration, I think doing this normally would be better. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From qamai at openjdk.org Mon Oct 31 02:49:25 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 31 Oct 2022 02:49:25 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops In-Reply-To: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 02:12:22 GMT, Quan Anh Mai wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3387: > >> 3385: for (int idx = 0; idx < 4; idx++) { >> 3386: // h = (31 * h) or (h << 5 - h); >> 3387: movl(tmp, result); > > If you are unrolling this, maybe break the dependency chain, `h = h * 31**4 + x[i] * 31**3 + x[i + 1] * 31**2 + x[i + 2] * 31 + x[i + 3]` A 256-bit vector is only 8 ints so this loop seems redundant, maybe running with the stride of 2 instead, in which case the single scalar calculation does also not need a loop. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From dholmes at openjdk.org Mon Oct 31 06:09:26 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 31 Oct 2022 06:09:26 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Fri, 21 Oct 2022 10:25:03 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Work around gtest exception compilation issues > - Fix Shenandoah > - Remove AnyObj new operator taking an allocation_type > - Use more specific allocation types In allocation.hpp we have this: 416: // Base class for classes that constitute name spaces. 417: 418: class Arena; The comment seems out of place as that belongs to `ALL_STATIC`. Nowhere do we seem to define/describe what an Arena is in this header file. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From dholmes at openjdk.org Mon Oct 31 06:23:30 2022 From: dholmes at openjdk.org (David Holmes) Date: Mon, 31 Oct 2022 06:23:30 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Fri, 21 Oct 2022 10:25:03 GMT, Stefan Karlsson wrote: >> Background to this patch: >> >> This prototype/patch has been discussed with a few HotSpot devs, and I've gotten feedback that I should send it out for broader discussion/review. It could be a first step to make it easier to talk about our allocation super classes and strategies. This in turn would make it easier to have further discussions around how to make our allocation strategies more flexible. E.g. do we really need to tie down utility classes to a specific allocation strategy? Do we really have to provide MEMFLAGS as compile time flags? Etc. >> >> PR RFC: >> >> HotSpot has a few allocation classes that other classes can inherit from to get different dynamic-allocation strategies: >> >> MetaspaceObj - allocates in the Metaspace >> CHeap - uses malloc >> ResourceObj - ... >> >> The last class sounds like it provide an allocation strategy to allocate inside a thread's resource area. This is true, but it also provides functions to allow the instances to be allocated in Areanas or even CHeap allocated memory. >> >> This is IMHO misleading, and often leads to confusion among HotSpot developers. >> >> I propose that we simplify ResourceObj to only provide an allocation strategy for resource allocations, and move the multi-allocation strategy feature to another class, which isn't named ResourceObj. >> >> In my proposal and prototype I've used the name AnyObj, as short, simple name. I'm open to changing the name to something else. >> >> The patch also adds a new class named ArenaObj, which is for objects only allocated in provided arenas. >> >> The patch also removes the need to provide ResourceObj/AnyObj::C_HEAP to `operator new`. If you pass in a MEMFLAGS argument it now means that you want to allocate on the CHeap. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge remote-tracking branch 'upstream/master' into 8295475_split_allocation_types > - Work around gtest exception compilation issues > - Fix Shenandoah > - Remove AnyObj new operator taking an allocation_type > - Use more specific allocation types I also like this as a cleanup of the existing allocation base classes. I too question whether allocation strategy should be a property of a type rather than an instance but I don't know whether that is possible/feasible? Perhaps some kind of placement new: MyStack heapStack = new (CHeap::allocate(sizeof(...)) MyStack(); ? But that could be done progressively in future enhancements. ------------- PR: https://git.openjdk.org/jdk/pull/10745 From stefank at openjdk.org Mon Oct 31 09:04:21 2022 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 31 Oct 2022 09:04:21 GMT Subject: RFR: 8295475: Move non-resource allocation strategies out of ResourceObj [v3] In-Reply-To: References: <4RakidFUe7jYYkY_1XkaBRuwJCxPd90CO1trC7QNzno=.18335453-ebc7-42b3-8973-d2ffefc47b53@github.com> Message-ID: On Mon, 31 Oct 2022 06:06:59 GMT, David Holmes wrote: > In allocation.hpp we have this: > > ``` > 416: // Base class for classes that constitute name spaces. > 417: > 418: class Arena; > ``` > > The comment seems out of place as that belongs to `ALL_STATIC`. > > Nowhere do we seem to define/describe what an Arena is in this header file. Yes, you are right. Here's the change that added `class Arena;` between the comment and the AllStatic class: https://github.com/openjdk/jdk/commit/d69af7b386da ------------- PR: https://git.openjdk.org/jdk/pull/10745 From mdoerr at openjdk.org Mon Oct 31 10:26:26 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 31 Oct 2022 10:26:26 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v2] In-Reply-To: <-0WuSW5EvM4cHDwBiIamjC3ltdYZzDNWgrTgwfQmfcg=.22c07703-48a3-444e-8d9f-ed119a35523e@github.com> References: <-0WuSW5EvM4cHDwBiIamjC3ltdYZzDNWgrTgwfQmfcg=.22c07703-48a3-444e-8d9f-ed119a35523e@github.com> Message-ID: On Thu, 27 Oct 2022 14:15:39 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Fixup comments Ok, this is the version which looks correct. Please consider moving the nmethod entry barrier implementation to the end such that you don't have to move it in your next PR. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From eosterlund at openjdk.org Mon Oct 31 10:35:36 2022 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 31 Oct 2022 10:35:36 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v2] In-Reply-To: <-0WuSW5EvM4cHDwBiIamjC3ltdYZzDNWgrTgwfQmfcg=.22c07703-48a3-444e-8d9f-ed119a35523e@github.com> References: <-0WuSW5EvM4cHDwBiIamjC3ltdYZzDNWgrTgwfQmfcg=.22c07703-48a3-444e-8d9f-ed119a35523e@github.com> Message-ID: On Thu, 27 Oct 2022 14:15:39 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Fixup comments Okay, looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From luhenry at openjdk.org Mon Oct 31 11:33:27 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 31 Oct 2022 11:33:27 GMT Subject: RFR: 8295948: Support for Zicbop/prefetch instructions on RISC-V In-Reply-To: References: Message-ID: <1yV4wylyTjD5-MuT7o_IylkxgU_O3xryd3cRlULgdbY=.9b73b7e9-5de1-47dc-ba6f-9dc30b400a9e@github.com> On Thu, 27 Oct 2022 15:18:02 GMT, Ludovic Henry wrote: > The OpenJDK supports generating prefetch instructions on most platforms. RISC-V supports through the Zicbop extension the use of prefetch instructions. We want to make sure we use these instructions whenever they are available. > > It passes `hotspot:tier1` test suite @RealFYang let me know what you think. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10884 From redestad at openjdk.org Mon Oct 31 12:07:43 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 12:07:43 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Reorder loops and some other suggestions from @merykitty ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/22fec5f0..6aed1c1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=00-01 Stats: 110 lines in 1 file changed: 59 ins; 45 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Oct 31 12:28:10 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 12:28:10 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2] In-Reply-To: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 02:34:06 GMT, Quan Anh Mai wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Reorder loops and some other suggestions from @merykitty > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3484: > >> 3482: decrementl(index); >> 3483: jmpb(LONG_SCALAR_LOOP_BEGIN); >> 3484: bind(LONG_SCALAR_LOOP_END); > > You can share this loop with the scalar ones above. This might be messier than it first looks, since the two different loops use different temp registers based (long scalar can scratch cnt1, short scalar scratches the coef register). I'll have to think about this for a bit. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3523: > >> 3521: subl(index, 32); >> 3522: // i >= 0; >> 3523: cmpl(index, 0); > > You don't need this since `subl` sets flags according to the result. Fixed ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Oct 31 12:32:34 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 12:32:34 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2] In-Reply-To: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 02:21:44 GMT, Quan Anh Mai wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Reorder loops and some other suggestions from @merykitty > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3358: > >> 3356: movl(result, is_string_hashcode ? 0 : 1); >> 3357: >> 3358: // if (cnt1 == 0) { > > You may want to reorder the execution of the loops, a short array suffers more from processing than a big array, so you should have minimum extra hops for those. For example, I think this could be: > > if (cnt1 >= 4) { > if (cnt1 >= 16) { > UNROLLED VECTOR LOOP > SINGLE VECTOR LOOP > } > UNROLLED SCALAR LOOP > } > SINGLE SCALAR LOOP > > The thresholds are arbitrary and need to be measured carefully. Fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3374: > >> 3372: >> 3373: // int i = 0; >> 3374: movl(index, 0); > > `xorl(index, index)` Fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3418: > >> 3416: // } else { // cnt1 >= 32 >> 3417: address power_of_31_backwards = pc(); >> 3418: emit_int32( 2111290369); > > Can this giant table be shared among compilations instead? Probably, though I'm not entirely sure on how. Maybe the "long" cases should be factored out into a set of stub routines so that it's not inlined in numerous places anyway. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Oct 31 12:32:34 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 12:32:34 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2] In-Reply-To: References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 02:15:35 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3387: >> >>> 3385: for (int idx = 0; idx < 4; idx++) { >>> 3386: // h = (31 * h) or (h << 5 - h); >>> 3387: movl(tmp, result); >> >> If you are unrolling this, maybe break the dependency chain, `h = h * 31**4 + x[i] * 31**3 + x[i + 1] * 31**2 + x[i + 2] * 31 + x[i + 3]` > > A 256-bit vector is only 8 ints so this loop seems redundant, maybe running with the stride of 2 instead, in which case the single scalar calculation does also not need a loop. Working on this.. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Oct 31 12:35:26 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 12:35:26 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Require UseSSE >= 3 due transitive use of sse3 instructions from ReduceI ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/6aed1c1e..7e8a3e9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From fyang at openjdk.org Mon Oct 31 12:54:29 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 31 Oct 2022 12:54:29 GMT Subject: RFR: 8286301: Port JEP 425 to RISC-V Message-ID: Hi, Please review this PR porting JEP 425 (Virtual Threads) to RISC-V. This is mainly adapted from the work of AArch64 port. Most of the changes lie in RISC-V scope. Changes to HotSpot shared code are trivial and are always guarded by RISCV64 macro. So this won't affect the rest of the world in theory. There exists some differences in frame structure between AArch64 and RISC-V. For AArch64, we have: enum { link_offset = 0, return_addr_offset = 1, sender_sp_offset = 2 }; While for RISC-V, we have: enum { link_offset = -2, return_addr_offset = -1, sender_sp_offset = 0 }; So we need adapations in some places where the code relies on value of sender_sp_offset to work. Implementation for Post-call NOPs optimization is not incorporated in this PR as we plan to evaluate more on its impact on performance. Testing on Linux-riscv64 HiFive Unmatched board: - minimal, client and server release & fastdebug build OK. - Passed tier1-tier4 tests (release build). - Passed jtreg tests under test/jdk/java/lang/Thread/virtual with extra JVM options: -XX:+VerifyContinuations -XX:+VerifyStack (fastdebug build). - Performed benchmark tests like Dacapo, SPECjvm2008, SPECjbb2015, etc. to make sure no performance regression are introduced (release build). ------------- Commit messages: - 8286301: JEP 425 to RISC-V Changes: https://git.openjdk.org/jdk/pull/10917/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10917&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8286301 Stats: 1260 lines in 30 files changed: 1008 ins; 82 del; 170 mod Patch: https://git.openjdk.org/jdk/pull/10917.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10917/head:pull/10917 PR: https://git.openjdk.org/jdk/pull/10917 From coleenp at openjdk.org Mon Oct 31 12:55:31 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Oct 2022 12:55:31 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 13:29:45 GMT, Coleen Phillimore wrote: > I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. > > Tested with tier1-4. Thank you for the code reviews, David and Serguei. ------------- PR: https://git.openjdk.org/jdk/pull/10249 From coleenp at openjdk.org Mon Oct 31 12:55:36 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Oct 2022 12:55:36 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: <7g0pl_nctOQ_VhQ_7fWCmP1rs3MdJCGOFjDp68nJ9ng=.ac107a90-ac03-4878-a118-a15808e5a4a8@github.com> On Fri, 28 Oct 2022 03:59:45 GMT, David Holmes wrote: >> src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 4408: >> >>> 4406: if (!the_class->has_been_redefined()) { >>> 4407: the_class->set_has_been_redefined(); >>> 4408: } >> >> Nit: Is this change really needed? > > Seems unrelated to this refactoring. If really a bug it should be fixed separately. This is needed by this change. The has_been_redefined flag can be set at runtime and with these flags, there's an assert for all that they're only set once. This one didn't have the assert but it can only be set in a safepoint, so I have this code to make it an exception. For now. The plan is to make the _flags set once, and the _status field (to be added) set at runtime, and this can be moved to that. ------------- PR: https://git.openjdk.org/jdk/pull/10249 From coleenp at openjdk.org Mon Oct 31 12:55:37 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Oct 2022 12:55:37 GMT Subject: RFR: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: <2-CLvbMReYrUGtJP0fjN4STIj8n05RUGgR0fNIJHH30=.7eb66d78-d6cd-4ca1-99ab-0789db2f0be9@github.com> On Fri, 28 Oct 2022 04:01:18 GMT, David Holmes wrote: >> I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. >> >> Tested with tier1-4. > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 129: > >> 127: >> 128: MISC_REWRITTEN = db.lookupIntConstant("InstanceKlass::_misc_rewritten").intValue(); >> 129: MISC_HAS_NONSTATIC_FIELDS = db.lookupIntConstant("InstanceKlass::_misc_has_nonstatic_fields").intValue(); > > These all seem to have been deleted rather than modified ?? Yes, they are unused by the SA and not really needed for core dump analysis unless someone finds a reason to see these. If so, some support can be added at that time. A better approach would be to have some python code to decode this in gdb if needed (probably only for scratch_class). ------------- PR: https://git.openjdk.org/jdk/pull/10249 From coleenp at openjdk.org Mon Oct 31 13:03:26 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Oct 2022 13:03:26 GMT Subject: Integrated: 8295964: Move InstanceKlass::_misc_flags In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 13:29:45 GMT, Coleen Phillimore wrote: > I moved misc_flags out to their own misc flags class so that we can put the writeable accessFlags there too with atomic access. > > Tested with tier1-4. This pull request has now been integrated. Changeset: 7e88209e Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/7e88209e6c28ce18974308382948555f7c524721 Stats: 352 lines in 9 files changed: 156 ins; 160 del; 36 mod 8295964: Move InstanceKlass::_misc_flags Reviewed-by: sspitsyn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/10249 From luhenry at openjdk.org Mon Oct 31 13:22:32 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 31 Oct 2022 13:22:32 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3] In-Reply-To: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 02:35:18 GMT, Quan Anh Mai wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Require UseSSE >= 3 due transitive use of sse3 instructions from ReduceI > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3493: > >> 3491: // vnext = IntVector.broadcast(I256, power_of_31_backwards[0]); >> 3492: movdl(vnext, InternalAddress(power_of_31_backwards + (0 * sizeof(jint)))); >> 3493: vpbroadcastd(vnext, vnext, Assembler::AVX_256bit); > > `vpbroadcastd` can take an `Address` argument instead. An `InternalAddress` isn't an `Address` but an `AddressLiteral`. You can however do `as_Address(InternalAddress(power_of_31_backwards + (0 * sizeof(jint))))` > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528: > >> 3526: vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit); >> 3527: } >> 3528: jmp(LONG_VECTOR_LOOP_BEGIN); > > Calculating backward forces you to do calculating the coefficients on each iteration, I think doing this normally would be better. But doing it forward requires a `reduceLane` on each iteration. It's faster to do it backward. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From qamai at openjdk.org Mon Oct 31 13:38:47 2022 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 31 Oct 2022 13:38:47 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v3] In-Reply-To: References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 13:18:35 GMT, Ludovic Henry wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 3528: >> >>> 3526: vpmulld(vcoef[idx], vcoef[idx], vnext, Assembler::AVX_256bit); >>> 3527: } >>> 3528: jmp(LONG_VECTOR_LOOP_BEGIN); >> >> Calculating backward forces you to do calculating the coefficients on each iteration, I think doing this normally would be better. > > But doing it forward requires a `reduceLane` on each iteration. It's faster to do it backward. No you don't need to, the vector loop can be calculated as: IntVector accumulation = IntVector.zero(INT_SPECIES); for (int i = 0; i < bound; i += INT_SPECIES.length()) { IntVector current = IntVector.load(INT_SPECIES, array, i); accumulation = accumulation.mul(31**(INT_SPECIES.length())).add(current); } return accumulation.mul(IntVector.of(31**INT_SPECIES.length() - 1, ..., 31**2, 31, 1).reduce(ADD); Each iteration only requires a multiplication and an addition. The weight of lanes can be calculated just before the reduction operation. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From tsteele at openjdk.org Mon Oct 31 14:26:35 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 31 Oct 2022 14:26:35 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v15] In-Reply-To: References: Message-ID: <8FBcyq91Cxy9M-Z-XASyUdI_6BX4znaMKhVf4tStNzI=.e052202f-e343-41c9-8ed9-728fad368080@github.com> > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Move nmethod_entry_barrier implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/72382f7b..dfbe898d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=13-14 Stats: 71 lines in 6 files changed: 0 ins; 68 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Mon Oct 31 14:29:08 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 31 Oct 2022 14:29:08 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v14] In-Reply-To: References: Message-ID: <0j3U9xUdibd4jm1p5ji0-XyaWZQnjUduh4Rtlcld5_E=.ff1b3814-5b95-4022-a8c3-ea8e630660a2@github.com> On Thu, 27 Oct 2022 23:46:43 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Changes > > - Adds labels to registers used in c2i_entry_barrier > - Removes use of R2 > - Removes erroneous uses of R0 > - Adds nm->c2i_entry_barrier in gen_i2c2i_adapters Also, in .hpp file, please. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From aph at openjdk.org Mon Oct 31 14:49:28 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 31 Oct 2022 14:49:28 GMT Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references Message-ID: JDK-8294902: Undefined Behavior in C2 regalloc with null references ------------- Commit messages: - More - Next - Next - Next - Next Changes: https://git.openjdk.org/jdk/pull/10920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294902 Stats: 51 lines in 8 files changed: 34 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10920.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10920/head:pull/10920 PR: https://git.openjdk.org/jdk/pull/10920 From tsteele at openjdk.org Mon Oct 31 15:10:44 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 31 Oct 2022 15:10:44 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v16] In-Reply-To: References: Message-ID: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Move nmethod_entry_barrier definition in hpp file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10558/files - new: https://git.openjdk.org/jdk/pull/10558/files/dfbe898d..c43e16ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10558&range=14-15 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10558.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10558/head:pull/10558 PR: https://git.openjdk.org/jdk/pull/10558 From mdoerr at openjdk.org Mon Oct 31 15:20:36 2022 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 31 Oct 2022 15:20:36 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v16] In-Reply-To: References: Message-ID: On Mon, 31 Oct 2022 15:10:44 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Move nmethod_entry_barrier definition in hpp file Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/10558 From aph at openjdk.org Mon Oct 31 15:38:36 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 31 Oct 2022 15:38:36 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v16] In-Reply-To: References: Message-ID: On Mon, 31 Oct 2022 15:10:44 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Move nmethod_entry_barrier definition in hpp file src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 138: > 136: > 137: // Load value from current java object: > 138: __ z_lg(Z_R0_scratch, in_bytes(bs_nm->thread_disarmed_offset()), Z_thread); // 6 bytes Isn't this loading from the current java Thread? ------------- PR: https://git.openjdk.org/jdk/pull/10558 From joe.darcy at oracle.com Mon Oct 31 15:49:34 2022 From: joe.darcy at oracle.com (Joe Darcy) Date: Mon, 31 Oct 2022 08:49:34 -0700 Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <4e1bf2e9-592c-62b5-cb0a-43dc1b70b94a@oracle.com> On 10/25/2022 7:27 AM, Andrew Haley wrote: > On Thu, 20 Oct 2022 20:26:47 GMT, Vladimir Ivanov wrote: > >> The GCC bugs with `-ffast-math` only corrupts `FTZ` and `DAZ`. >> >> But `RC` and exception masks may be corrupted as well the same way and I believe the consequences are be similar (silent divergence in results during FP computations). > I think we can catch the things that are likely, and will result in silent corruption. We should limit this, I think, to rounding modes and denormals-to-zero. I don't think we should bother with exception masks. > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10661 In terms of the overhead of using floating-point expression evaluation as a guard, are there still platforms where operating on subnormal values is pathologically slower? Some generations of SPARC chips had that behavior where a subnormal multiply would take, say 10,000 cycles, rather than 3 or 4 since the subnormal operations were implemented via trap handling. -Joe From tsteele at openjdk.org Mon Oct 31 16:00:50 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 31 Oct 2022 16:00:50 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v16] In-Reply-To: References: Message-ID: <2IKz9d7VJGhkUeGEkYHPMeNfDMkDYd688ohHlwGp9-s=.1c1cafa7-2231-4b26-8547-be366790a74f@github.com> On Mon, 31 Oct 2022 15:10:44 GMT, Tyler Steele wrote: >> This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Move nmethod_entry_barrier definition in hpp file Excellent :-) and thanks again for all the support. I believe it is time to: ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Mon Oct 31 16:00:50 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 31 Oct 2022 16:00:50 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v16] In-Reply-To: References: Message-ID: On Mon, 31 Oct 2022 15:36:12 GMT, Andrew Haley wrote: >> Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: >> >> Move nmethod_entry_barrier definition in hpp file > > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 138: > >> 136: >> 137: // Load value from current java object: >> 138: __ z_lg(Z_R0_scratch, in_bytes(bs_nm->thread_disarmed_offset()), Z_thread); // 6 bytes > > Isn't this loading from the current java Thread? Just saw this. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Mon Oct 31 16:02:41 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 31 Oct 2022 16:02:41 GMT Subject: Integrated: 8294729: [s390] Implement nmethod entry barriers In-Reply-To: References: Message-ID: On Tue, 4 Oct 2022 14:27:09 GMT, Tyler Steele wrote: > This draft PR implements native method barriers on s390. When complete, this will fix the build, and bring the other benefits of [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) to that platform. This pull request has now been integrated. Changeset: f4d8c20c Author: Tyler Steele URL: https://git.openjdk.org/jdk/commit/f4d8c20c3b81f65f955591c64281a103225691d9 Stats: 207 lines in 10 files changed: 198 ins; 0 del; 9 mod 8294729: [s390] Implement nmethod entry barriers Reviewed-by: mdoerr, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/10558 From tsteele at openjdk.org Mon Oct 31 16:07:35 2022 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 31 Oct 2022 16:07:35 GMT Subject: RFR: 8294729: [s390] Implement nmethod entry barriers [v16] In-Reply-To: References: Message-ID: <4v9J1FwZgdQYJEGjPZVg2jkY1-M-Xvi3QlW9Ggnc6K8=.c3c5f382-8ec8-4a04-b682-c874ac1c2912@github.com> On Mon, 31 Oct 2022 15:58:05 GMT, Tyler Steele wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 138: >> >>> 136: >>> 137: // Load value from current java object: >>> 138: __ z_lg(Z_R0_scratch, in_bytes(bs_nm->thread_disarmed_offset()), Z_thread); // 6 bytes >> >> Isn't this loading from the current java Thread? > > Just saw this. I think your question is about the comment on line 137. Yes, this could be improved. ------------- PR: https://git.openjdk.org/jdk/pull/10558 From aph-open at littlepinkcloud.com Mon Oct 31 17:33:06 2022 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 31 Oct 2022 17:33:06 +0000 Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <4e1bf2e9-592c-62b5-cb0a-43dc1b70b94a@oracle.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <4e1bf2e9-592c-62b5-cb0a-43dc1b70b94a@oracle.com> Message-ID: <421d3cd4-3197-8041-e051-8b4cee5f1083@littlepinkcloud.com> On 10/31/22 15:49, Joe Darcy wrote: > In terms of the overhead of using floating-point expression evaluation > as a guard, are there still platforms where operating on subnormal > values is pathologically slower? Some generations of SPARC chips had > that behavior where a subnormal multiply would take, say 10,000 cycles, > rather than 3 or 4 since the subnormal operations were implemented via > trap handling. That's a very interesting point. I know it used to be the case that denormals were handled by trapping to microcode, but there are good hardware algorithms since Schwarz et al, 2003 [1]. This paper showed how with a little hardware, such numbers can be handled close to the speed of normalized numbers. I deliberately ran my tests on a ten-year-old CPU, but I guess I'd have to go further back to find a bad case. Anyway, I plan to a. Restore the FPU CR after calls to dlopen(3). b. Detect FPU CR corruption at safepoints, and print a warning. At least the user might find out that something is wrong. I think this will avoid most cases of badness. I guess I'll need a CSR for this? [1] Hardware implementations of denormalized numbers, DOI:10.1109/ARITH.2003.1207662 Conference: Computer Arithmetic, 2003. Proceedings. 16th IEEE Symposium on -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.org Mon Oct 31 17:48:17 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 31 Oct 2022 17:48:17 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: <_23f45GgIwXcJmH_ROBzCNrIEiwk_ho5Ic5IRoLOoZQ=.69e2715b-cea3-43e2-8569-4fb74416b86d@github.com> On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic On 10/31/22 15:49, Joe Darcy wrote: > In terms of the overhead of using floating-point expression evaluation > as a guard, are there still platforms where operating on subnormal > values is pathologically slower? Some generations of SPARC chips had > that behavior where a subnormal multiply would take, say 10,000 cycles, > rather than 3 or 4 since the subnormal operations were implemented via > trap handling. That's a very interesting point. I know it used to be the case that denormals were handled by trapping to microcode, but there are good hardware algorithms since Schwarz et al, 2003 [1]. This paper showed how with a little hardware, such numbers can be handled close to the speed of normalized numbers. I deliberately ran my tests on a ten-year-old CPU, but I guess I'd have to go further back to find a bad case. Anyway, I plan to a. Restore the FPU CR after calls to dlopen(3). b. Detect FPU CR corruption at safepoints, and print a warning. At least the user might find out that something is wrong. I think this will avoid most cases of badness. I guess I'll need a CSR for this? [1] Hardware implementations of denormalized numbers, DOI:10.1109/ARITH.2003.1207662 Conference: Computer Arithmetic, 2003. Proceedings. 16th IEEE Symposium on ------------- PR: https://git.openjdk.org/jdk/pull/10661 From coleenp at openjdk.org Mon Oct 31 18:25:29 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Oct 2022 18:25:29 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v5] In-Reply-To: References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Fri, 28 Oct 2022 19:18:27 GMT, Matias Saavedra Silva wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 >> 10 invokedynamic bsm=31 13 >> BSM: REF_invokeStatic 32 >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Added resolution information for fields I look at this output. src/hotspot/share/oops/constantPool.cpp line 2321: > 2319: // print strings not indices > 2320: //st->print("klass_index=%d", uncached_klass_ref_index_at(index)); > 2321: //st->print(" name_and_type_index=%d", uncached_name_and_type_ref_index_at(index)); Should remove commented out code! src/hotspot/share/oops/constantPool.cpp line 2342: > 2340: case JVM_CONSTANT_NameAndType : > 2341: // st->print("name_index=%d", name_ref_index_at(index)); > 2342: // st->print(" signature_index=%d", signature_ref_index_at(index)); Same here. src/hotspot/share/oops/constantPool.hpp line 697: > 695: // Acquire symbols from method and field entries > 696: // For fields and methods > 697: char* name_symbol_at(int which) { return symbol_at(name_ref_index_at(uncached_name_and_type_ref_index_at(which)))->as_C_string(); } Can you align the {'s or use a line break and make these functions "const". src/hotspot/share/oops/cpCache.cpp line 690: > 688: st->print_cr(" - tos: %s\n - final: %d\n - volatile: %d\n - field index: %04x", > 689: type2name(as_BasicType(flag_state())), is_final(), is_volatile(), field_index()); > 690: st->print_cr(" - is resolved: %s", pool->tag_at(constant_pool_index()).is_unresolved_klass() ? "false" : "true"); This is confusing and doesn't really help. "is_resolved" means that the bytecode_1 or bytecode_2 is filled in in the indices field. The constant pool index here is a JVM_CONSTANT_Fieldref which points to JVM_CONSTANT_NameAndType and JVM_CONSTANT_{UnresolvedClass,Class,UnresolvedClassInError}. I think this would always print false. I would just leave this line out. ------------- PR: https://git.openjdk.org/jdk/pull/10860 From jvernee at openjdk.org Mon Oct 31 18:28:29 2022 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 31 Oct 2022 18:28:29 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: <_23f45GgIwXcJmH_ROBzCNrIEiwk_ho5Ic5IRoLOoZQ=.69e2715b-cea3-43e2-8569-4fb74416b86d@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <_23f45GgIwXcJmH_ROBzCNrIEiwk_ho5Ic5IRoLOoZQ=.69e2715b-cea3-43e2-8569-4fb74416b86d@github.com> Message-ID: On Mon, 31 Oct 2022 17:44:07 GMT, Andrew Haley wrote: > Anyway, I plan to > > a. Restore the FPU CR after calls to dlopen(3). > b. Detect FPU CR corruption at safepoints, and print a warning. At least > the user might find out that something is wrong. Doing (a) seems good. I can't say for sure whether (b) is a good idea. I guess you just have some call to verify the FPU, e.g. in `ParallelCleanupTask`? I assume you don't mean to change the code for polling safepoints. CSR seems like a good idea, since it could be a change in observable behavior. (nice to leave a paper trail I think) ------------- PR: https://git.openjdk.org/jdk/pull/10661 From matsaave at openjdk.org Mon Oct 31 18:45:18 2022 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 31 Oct 2022 18:45:18 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v5] In-Reply-To: References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Mon, 31 Oct 2022 18:21:02 GMT, Coleen Phillimore wrote: > I look at this output. I made the mistake of pushing incomplete code which explains the chunks of commented code and other issues. I am currently further extending the information being printed both in the constant pool cache and the constant pool as requested by @iklam. That being said, I will take your corrections into account as I move forward! ------------- PR: https://git.openjdk.org/jdk/pull/10860 From aph-open at littlepinkcloud.com Mon Oct 31 18:51:34 2022 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 31 Oct 2022 18:51:34 +0000 Subject: RFR: JDK-8294902: Undefined Behavior in C2 regalloc with null references Message-ID: <436eb6ce-25d5-ea1d-f52e-aa48e8034585@littlepinkcloud.com> This patch fixes the remaining null pointer dereference bugs that I know of. For the main bug, C2 was using a null reference to indicate an uninitialized Node_List. I replaced the null reference with a static sentinel. I also turned on -fsanitize=null and found and fixed a bunch of other null pointer dereferences. With this,I have run a full bootstrap and tier1 tests with -fsanitize=null enabled. I have checked that the code generated by GCC is not worse in any significant way, so I don't expect to see any performance regressions. I'd like to enable -fsanitize=null in debug builds to prevent regressions in this area. What do you think? ------------- Changes: https://git.openjdk.org/jdk/pull/10920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8294902 Stats: 51 lines in 8 files changed: 34 ins; 1 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10920.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10920/head:pull/10920 PR: https://git.openjdk.org/jdk/pull/10920 From aph at openjdk.org Mon Oct 31 18:53:21 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 31 Oct 2022 18:53:21 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v7] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <_23f45GgIwXcJmH_ROBzCNrIEiwk_ho5Ic5IRoLOoZQ=.69e2715b-cea3-43e2-8569-4fb74416b86d@github.com> Message-ID: On Mon, 31 Oct 2022 18:25:57 GMT, Jorn Vernee wrote: > > Anyway, I plan to > > a. Restore the FPU CR after calls to dlopen(3). > > b. Detect FPU CR corruption at safepoints, and print a warning. At least > > the user might find out that something is wrong. > > Doing (a) seems good. I can't say for sure whether (b) is a good idea. I guess you just have some call to verify the FPU, e.g. in `ParallelCleanupTask`? I assume you don't mean to change the code for polling safepoints. My intention is only to do the FPU CR check when there is an expensive operation such as moving to s safepoint. That we we won't see a significant slowdown. But we are pretty much guaranteed that the user won't get incorrect results without some sort of diagnostic. > CSR seems like a good idea, since it could be a change in observable behavior. (nice to leave a paper trail I think) Yep. ------------- PR: https://git.openjdk.org/jdk/pull/10661 From coleenp at openjdk.org Mon Oct 31 20:31:00 2022 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 31 Oct 2022 20:31:00 GMT Subject: RFR: 8295893: Improve printing of Constant Pool Cache Entries [v3] In-Reply-To: References: <_0zDuYxE3ZldKFZfB4InFvJve-CGaZXL-VpG1bVHbh4=.5aeb65e0-2847-4a35-8fb1-e7d7f238a5f8@github.com> Message-ID: On Thu, 27 Oct 2022 14:31:43 GMT, Matias Saavedra Silva wrote: >> As an extension of [JDK-8292699](https://bugs.openjdk.org/browse/JDK-8292699), this aims to further improve the printing of Constant Pool Cache entries. The contents and flag are decoded into human readable text with an appendix printed as before. >> >> The text format and contents are tentative, please review. >> >> Here is an example output when using `findmethod()`: >> >> "Executing findmethod" >> flags (bitmask): >> 0x01 - print names of methods >> 0x02 - print bytecodes >> 0x04 - print the address of bytecodes >> 0x08 - print info for invokedynamic >> 0x10 - print info for invokehandle >> >> [ 0] 0x0000000801000800 class Concat0 loader data: 0x00007ffff02ddeb0 for instance a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007fef59110} >> 0x00007fffa0400368 static method main : ([Ljava/lang/String;)V >> 0 iconst_0 >> 1 istore_1 >> 2 iload_1 >> 3 iconst_2 >> 4 if_icmpge 24 >> 7 getstatic 7 >> 10 invokedynamic bsm=31 13 >> BSM: REF_invokeStatic 32 >> arguments[1] = { >> 000 >> } >> ConstantPoolCacheEntry: 4 >> - this: 0x00007fffa0400570 >> - bytecode 1: invokedynamic ba >> - bytecode 2: nop 00 >> - cp index: 13 >> - F1: [ 0x00000008000c8658] >> - F2: [ 0x0000000000000003] >> - Method: 0x00000008000c8658 java.lang.Object java.lang.invoke.Invokers$Holder.linkToTargetMethod(java.lang.Object, java.lang.Object) >> - flag values: [08|0|0|1|1|0|1|0|0|0|00|00|02] >> - tos: object >> - local signature: 1 >> - has appendix: 1 >> - forced virtual: 0 >> - final: 1 >> - virtual Final: 0 >> - resolution Failed: 0 >> - num Parameters: 02 >> Method: 0x00000008000c8658 java/lang/invoke/Invokers$Holder.linkToTargetMethod(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; >> appendix: java.lang.invoke.BoundMethodHandle$Species_LL >> {0x000000011f021360} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LL' >> - ---- fields (total size 5 words): >> - private 'customizationCount' 'B' @12 0 (0x00) >> - private volatile 'updateInProgress' 'Z' @13 false (0x00) >> - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x000000011f0185b0} = (Ljava/lang/String;)Ljava/lang/String; (0x23e030b6) >> - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x000000011f01df40} => a 'java/lang/invoke/MemberName'{0x000000011f0211e8} = {method} {0x00007fffa04012a8} 'invoke' '(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH+0x0000000801000400' (0x23e03be8) >> - private 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0x00000000) >> - private 'asTypeSoftCache' 'Ljava/lang/ref/SoftReference;' @28 NULL (0x00000000) >> - final 'argL0' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/DirectMethodHandle'{0x000000011f019b70} (0x23e0336e) >> - final 'argL1' 'Ljava/lang/Object;' @36 "000"{0x000000011f0193d0} (0x23e0327a) >> ------------- >> 15 putstatic 17 >> 18 iinc #1 1 >> 21 goto 2 >> 24 return > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > changed NULL to nullptr This looks good. The more verbose field entry information can be added in a further RFE. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/10860 From redestad at openjdk.org Mon Oct 31 21:48:37 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 21:48:37 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v4] In-Reply-To: References: Message-ID: > Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. > > I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. > > Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. > > The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. > > With the most recent fixes the x64 intrinsic results on my workstation look like this: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op > StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op > StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op > StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op > > I.e. no measurable overhead compared to baseline even for `size == 1`. > > The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. > > Benchmark for `Arrays.hashCode`: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op > ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op > ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op > ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op > ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op > ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op > ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op > ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op > ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op > ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op > ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op > ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op > ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op > ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op > > Baseline: > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op > ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op > ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op > ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op > ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op > ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op > ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op > ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op > ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op > ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op > ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op > ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op > ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op > ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op > ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op > ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op > > > As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Change scalar unroll to 2 element stride, minding dependency chain ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10847/files - new: https://git.openjdk.org/jdk/pull/10847/files/7e8a3e9c..a473c200 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10847&range=02-03 Stats: 64 lines in 1 file changed: 28 ins; 20 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/10847.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10847/head:pull/10847 PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Oct 31 22:10:30 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 22:10:30 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v4] In-Reply-To: References: Message-ID: <3ADHhMibv2q23PC2uQp57gFynSbqH6K4s0jCutZuogM=.b62084b3-bfab-4150-9b2a-e06813099ce8@github.com> On Mon, 31 Oct 2022 21:48:37 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` method. To make this work I've harmonized how they are invoked so that there's less special handling and checks in the intrinsic. Mainly do the null-check outside of the intrinsic for `Arrays.hashCode` cases. >> >> Having a centralized entry point means it'll be easier to parameterize the factor and start values which are now hard-coded (always 31, and a start value of either one for `Arrays` or zero for `String`). It seems somewhat premature to parameterize this up front. >> >> The current implementation is performance neutral on microbenchmarks on all tested platforms (x64, aarch64) when not enabling the intrinsic. We do add a few trivial method calls which increase the call stack depth, so surprises cannot be ruled out on complex workloads. >> >> With the most recent fixes the x64 intrinsic results on my workstation look like this: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.199 ? 0.017 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 6.933 ? 0.049 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 29.935 ? 0.221 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 1596.982 ? 7.020 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> StringHashCode.Algorithm.defaultLatin1 1 avgt 5 2.200 ? 0.013 ns/op >> StringHashCode.Algorithm.defaultLatin1 10 avgt 5 9.424 ? 0.122 ns/op >> StringHashCode.Algorithm.defaultLatin1 100 avgt 5 90.541 ? 0.512 ns/op >> StringHashCode.Algorithm.defaultLatin1 10000 avgt 5 9425.321 ? 67.630 ns/op >> >> I.e. no measurable overhead compared to baseline even for `size == 1`. >> >> The vectorized code now nominally works for all unsigned cases as well as ints, though more testing would be good. >> >> Benchmark for `Arrays.hashCode`: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 1.884 ? 0.013 ns/op >> ArraysHashCode.bytes 10 avgt 5 6.955 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 87.218 ? 0.595 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9419.591 ? 38.308 ns/op >> ArraysHashCode.chars 1 avgt 5 2.200 ? 0.010 ns/op >> ArraysHashCode.chars 10 avgt 5 6.935 ? 0.034 ns/op >> ArraysHashCode.chars 100 avgt 5 30.216 ? 0.134 ns/op >> ArraysHashCode.chars 10000 avgt 5 1601.629 ? 6.418 ns/op >> ArraysHashCode.ints 1 avgt 5 2.200 ? 0.007 ns/op >> ArraysHashCode.ints 10 avgt 5 6.936 ? 0.034 ns/op >> ArraysHashCode.ints 100 avgt 5 29.412 ? 0.268 ns/op >> ArraysHashCode.ints 10000 avgt 5 1610.578 ? 7.785 ns/op >> ArraysHashCode.shorts 1 avgt 5 1.885 ? 0.012 ns/op >> ArraysHashCode.shorts 10 avgt 5 6.961 ? 0.034 ns/op >> ArraysHashCode.shorts 100 avgt 5 87.095 ? 0.417 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.617 ? 50.089 ns/op >> >> Baseline: >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 3.213 ? 0.207 ns/op >> ArraysHashCode.bytes 10 avgt 5 8.483 ? 0.040 ns/op >> ArraysHashCode.bytes 100 avgt 5 90.315 ? 0.655 ns/op >> ArraysHashCode.bytes 10000 avgt 5 9422.094 ? 62.402 ns/op >> ArraysHashCode.chars 1 avgt 5 3.040 ? 0.066 ns/op >> ArraysHashCode.chars 10 avgt 5 8.497 ? 0.074 ns/op >> ArraysHashCode.chars 100 avgt 5 90.074 ? 0.387 ns/op >> ArraysHashCode.chars 10000 avgt 5 9420.474 ? 41.619 ns/op >> ArraysHashCode.ints 1 avgt 5 2.827 ? 0.019 ns/op >> ArraysHashCode.ints 10 avgt 5 7.727 ? 0.043 ns/op >> ArraysHashCode.ints 100 avgt 5 89.405 ? 0.593 ns/op >> ArraysHashCode.ints 10000 avgt 5 9426.539 ? 51.308 ns/op >> ArraysHashCode.shorts 1 avgt 5 3.071 ? 0.062 ns/op >> ArraysHashCode.shorts 10 avgt 5 8.168 ? 0.049 ns/op >> ArraysHashCode.shorts 100 avgt 5 90.399 ? 0.292 ns/op >> ArraysHashCode.shorts 10000 avgt 5 9420.171 ? 44.474 ns/op >> >> >> As we can see the `Arrays` intrinsics are faster for small inputs, and faster on large inputs for `char` and `int` (the ones currently vectorized). I aim to fix `byte` and `short` cases before integrating, though it might be acceptable to hand that off as follow-up enhancements to not further delay integration of this enhancement. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Change scalar unroll to 2 element stride, minding dependency chain A stride of 2 allows small element cases to perform a bit better, while also performing better than before on longer arrays for the `byte` and `short` cases that don't get any benefit from vectorization: Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.bytes 1 avgt 5 1.414 ? 0.005 ns/op ArraysHashCode.bytes 10 avgt 5 6.908 ? 0.020 ns/op ArraysHashCode.bytes 100 avgt 5 73.666 ? 0.390 ns/op ArraysHashCode.bytes 10000 avgt 5 7846.994 ? 53.628 ns/op ArraysHashCode.chars 1 avgt 5 1.414 ? 0.007 ns/op ArraysHashCode.chars 10 avgt 5 7.229 ? 0.044 ns/op ArraysHashCode.chars 100 avgt 5 30.718 ? 0.229 ns/op ArraysHashCode.chars 10000 avgt 5 1621.463 ? 116.286 ns/op ArraysHashCode.ints 1 avgt 5 1.414 ? 0.008 ns/op ArraysHashCode.ints 10 avgt 5 7.540 ? 0.042 ns/op ArraysHashCode.ints 100 avgt 5 29.429 ? 0.121 ns/op ArraysHashCode.ints 10000 avgt 5 1600.855 ? 9.274 ns/op ArraysHashCode.shorts 1 avgt 5 1.414 ? 0.010 ns/op ArraysHashCode.shorts 10 avgt 5 6.914 ? 0.045 ns/op ArraysHashCode.shorts 100 avgt 5 73.684 ? 0.501 ns/op ArraysHashCode.shorts 10000 avgt 5 7846.829 ? 49.984 ns/op I've also made some changes to improve the String cases, which can avoid the first coeff*h multiplication on first pass. This gets the size 1 latin1 case down to 1.1ns/op without penalizing the empty case. We're now improving over the baseline on almost all* tested sizes: Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 0 avgt 5 0.946 ? 0.005 ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 5 1.108 ? 0.003 ns/op StringHashCode.Algorithm.defaultLatin1 2 avgt 5 2.042 ? 0.005 ns/op StringHashCode.Algorithm.defaultLatin1 31 avgt 5 18.636 ? 0.286 ns/op StringHashCode.Algorithm.defaultLatin1 32 avgt 5 15.938 ? 1.086 ns/op StringHashCode.Algorithm.defaultUTF16 0 avgt 5 1.257 ? 0.004 ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 5 2.198 ? 0.005 ns/op StringHashCode.Algorithm.defaultUTF16 2 avgt 5 2.559 ? 0.011 ns/op StringHashCode.Algorithm.defaultUTF16 31 avgt 5 15.754 ? 0.036 ns/op StringHashCode.Algorithm.defaultUTF16 32 avgt 5 16.616 ? 0.042 ns/op Baseline: Benchmark (size) Mode Cnt Score Error Units StringHashCode.Algorithm.defaultLatin1 0 avgt 5 0.942 ? 0.005 ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 5 1.991 ? 0.013 ns/op StringHashCode.Algorithm.defaultLatin1 2 avgt 5 2.831 ? 0.021 ns/op StringHashCode.Algorithm.defaultLatin1 31 avgt 5 25.042 ? 0.112 ns/op StringHashCode.Algorithm.defaultLatin1 32 avgt 5 25.857 ? 0.133 ns/op StringHashCode.Algorithm.defaultUTF16 0 avgt 5 0.789 ? 0.006 ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 5 3.459 ? 0.007 ns/op StringHashCode.Algorithm.defaultUTF16 2 avgt 5 4.400 ? 0.010 ns/op StringHashCode.Algorithm.defaultUTF16 31 avgt 5 25.721 ? 0.067 ns/op StringHashCode.Algorithm.defaultUTF16 32 avgt 5 27.162 ? 0.093 ns/op There's a negligible regression on `defaultUTF16` for size = 0 due moving the length shift up earlier. This can only happen when running with CompactStrings disabled. And even if you were the change significantly helps improve size 1-31, which should more than make up for a small cost increase hashing empty strings. ------------- PR: https://git.openjdk.org/jdk/pull/10847 From redestad at openjdk.org Mon Oct 31 22:10:31 2022 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 31 Oct 2022 22:10:31 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v4] In-Reply-To: References: <5QCLl4R86LlhX9dkwbK7-NtPwkiN9tgQvj0VFoApvzU=.0b12f837-47d4-470a-9b40-961ccd8e181e@github.com> Message-ID: On Mon, 31 Oct 2022 13:35:36 GMT, Quan Anh Mai wrote: >> But doing it forward requires a `reduceLane` on each iteration. It's faster to do it backward. > > No you don't need to, the vector loop can be calculated as: > > IntVector accumulation = IntVector.zero(INT_SPECIES); > for (int i = 0; i < bound; i += INT_SPECIES.length()) { > IntVector current = IntVector.load(INT_SPECIES, array, i); > accumulation = accumulation.mul(31**(INT_SPECIES.length())).add(current); > } > return accumulation.mul(IntVector.of(31**INT_SPECIES.length() - 1, ..., 31**2, 31, 1).reduce(ADD); > > Each iteration only requires a multiplication and an addition. The weight of lanes can be calculated just before the reduction operation. Ok, I can try rewriting as @merykitty suggests and compare. I'm running out of time to spend on this right now, though, so I sort of hope we can do this experiment as a follow-up RFE. ------------- PR: https://git.openjdk.org/jdk/pull/10847