From duke at openjdk.java.net Tue Feb 1 03:43:34 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Tue, 1 Feb 2022 03:43:34 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. Message-ID: 8251505: Use of types in compiler shared code should be consistent. ------------- Commit messages: - Remove JFR changes - Make the type of methodReclaimedCount consistent - Minor formatting - Use signed integers - 8251505: Use of types in compiler shared code should be consistent. Changes: https://git.openjdk.java.net/jdk/pull/7294/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8251505 Stats: 33 lines in 9 files changed: 2 ins; 0 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From stuefe at openjdk.java.net Tue Feb 1 05:57:10 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 1 Feb 2022 05:57:10 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 22:12:30 GMT, David Holmes wrote: > > That's a valid concern. I've also asked myself this question when I had initially started using some assertions. We should not crash again during error reporting. I've therefore tried to be as conservative as possible and added bailouts instead, also in loops when reading data. But of course, this is just a best effort and by no means a guarantee to be safe (especially in terms of crashes). What could be alternatives to make this better? > > If the parsing code turns out to be very problematic in a signal handling context, then we could disable it in that context. So we really want to try and do a lot of testing by throwing random signals at the VM and see what breaks. > Source information in hs-err file stacks can be tremendously useful. Lets try the retry-callstack-dumping without features idea in case of a secondary crash, outlined above, first. > > > Secondly, on the same issue the use of unified logging within this code seems even more likely to be problematic - I'm not aware of us currently using UL during error reporting. It may work in basic usecases but if it triggers logfile rotation or other more complex actions what then? > > > > > > I haven't thought about this before. To be honest, I think UL printing of the `dwarf` tag is only useful during development when adding something new to the parser or when debugging. I don't see much value of these messages otherwise - even less for a Java user. As a first step, I could change the logs from `log_X()` to `log_develop_X()` but that just shifts the problem to non-product builds. Another option (or additional thing) could be to guard the log messages with a new develop flag that's disabled by default. By setting it for development, we accept that it might be unsafe which should be fine. > > I think changing the logging to develop only is a reasonable step. I don't see logging of crash handling / error reporting as generally useful for the end user. I think the right way to go longterm would be to give us a minimalistic safe logging API for these cases (signal handling, pre-initialization) or make UL safe to use always. ------------- PR: https://git.openjdk.java.net/jdk/pull/7126 From thartmann at openjdk.java.net Tue Feb 1 08:29:38 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 1 Feb 2022 08:29:38 GMT Subject: [jdk18] RFR: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob Message-ID: Backport of [JDK-8278871](https://bugs.openjdk.java.net/browse/JDK-8278871). Applies cleanly. ------------- Commit messages: - 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob Changes: https://git.openjdk.java.net/jdk18/pull/114/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk18&pr=114&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278871 Stats: 21 lines in 5 files changed: 9 ins; 4 del; 8 mod Patch: https://git.openjdk.java.net/jdk18/pull/114.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/114/head:pull/114 PR: https://git.openjdk.java.net/jdk18/pull/114 From stuefe at openjdk.java.net Tue Feb 1 08:58:27 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 1 Feb 2022 08:58:27 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend Message-ID: NMT backend can be further simplified and cleaned out. - some entry points require NMT_TrackingLevel as arguments, some use the global tracking level. Ultimately, every part of NMT always uses the global tracking level, so in many cases the explicit parameter can be removed and the global tracking level can be used instead. - `MemTracker::malloc_header_size(level)` + `MemTracker::malloc_footer_size(level)` are fused into `MemTracker::overhead_per_malloc()` - when adding to `MallocSiteTable`, caller gets back a shortcut to the entry. That shortcut is stored verbatim in the malloc header. It consists of two 16-bit values (bucket index and chain position). That tupel finds its way into many argument lists. It can be simplified into single 32-bit opaque marker. Code outside the MallocSiteTable does not need to know what it is. - Currently, the `MallocHeader` class contains a lot of logic. It accounts (in constructor) and de-accounts (in `MallocHeader::release()`). It would simplify code if `MallocHeader` were just a dumb data carrier and the `MallocTracker` would do the actual work. - `MallocHeader` can be simplified, almost all members made constant and modifying accessors removed. - In some places we handle inputptr=NULL gracefully where we should assert instead - Expressions like `MemTracker::tracking_level() != NMT_off` can be simplified to `MemTracker::enabled()`. - MemTracker::malloc_base (all variants) can be removed. Note that we have MallocTracker::malloc_header, which achieves the same and does not require casting to the header. Testing: - GHAs - manually ran NMT gtests (all NMT modes) and NMT jtreg tests on Ubuntu x64 - SAP nightlies ran through. Note that since 8275301 "Unify C-heap buffer overrun checks into NMT" NMT is enabled by default in debug builds, so it gets a lot more workout in tests now. Note that I wanted to manually verify that the gdb "call pp" command still works in order to not break Zhengyu's recent addition, but found its already broken. I filed https://bugs.openjdk.java.net/browse/JDK-8281023 and am preparing a separate patch. ------------- Commit messages: - pp should handle NULL correctly - remove mostly unused MallocTracker accessors for header members - Remove use of NMT level and simplify malloc+realloc+free - dumb down malloc header - mst bucket+pos=marker - remove malloc_base Changes: https://git.openjdk.java.net/jdk/pull/7283/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7283&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281015 Stats: 266 lines in 10 files changed: 49 ins; 147 del; 70 mod Patch: https://git.openjdk.java.net/jdk/pull/7283.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7283/head:pull/7283 PR: https://git.openjdk.java.net/jdk/pull/7283 From stuefe at openjdk.java.net Tue Feb 1 09:05:33 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 1 Feb 2022 09:05:33 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work Message-ID: JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: (gdb) call pp(0x7ffff010b030) "Executing pp" Thread 2 "java" received signal SIGSEGV, Segmentation fault. 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 180 assert(addr != NULL, "Invalid address"); On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { This patch: - fixes uses of SafeFetch - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) Tests: - I tested manually that pp works with different levels of NMT (Linux x64) - GHAs in process ------------- Commit messages: - Fix NMT integration into pp() debug command Changes: https://git.openjdk.java.net/jdk/pull/7297/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7297&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281023 Stats: 28 lines in 3 files changed: 9 ins; 3 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/7297.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7297/head:pull/7297 PR: https://git.openjdk.java.net/jdk/pull/7297 From tschatzl at openjdk.java.net Tue Feb 1 09:18:12 2022 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 1 Feb 2022 09:18:12 GMT Subject: RFR: 8280916: Simplify HotSpot Style Guide editorial changes In-Reply-To: References: Message-ID: On Sun, 30 Jan 2022 00:39:20 GMT, Kim Barrett wrote: > Please review this change to the HotSpot Style Guide change process. > > The current process involves gathering consensus among the HotSpot Group > Members. That's fine for changes of substance. But it seems overly weighty > for editorial changes that don't affect the substance of the guide, but only > it's clarity or accuracy. > > The proposed change would permit the normal PR process to be used for such > changes, but require the requisite reviewers to additionally be HotSpot Group > Members. > > Note that there have already been a couple of changes that effectively > followed the proposed new process. > https://bugs.openjdk.java.net/browse/JDK-8274169 > https://bugs.openjdk.java.net/browse/JDK-8280182 > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will not > be made before Monday 14-Feb-2022 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review process > to approve (click on Review Changes > Approve), rather than sending a "vote: > yes" email reply that would be normal for a CFV. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7281 From duke at openjdk.java.net Tue Feb 1 11:09:20 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 1 Feb 2022 11:09:20 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 22:35:35 GMT, David Holmes wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix popframe failures > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 429: > >> 427: #else >> 428: warning("UseROPProtection specified, but not supported in the VM."); >> 429: #endif > > If we issue these warnings should `_rop_protection` still be set true? As per this conversation: https://github.com/openjdk/jdk/pull/6334#discussion_r791722292 The idea was, the user is explicitly asking for asking for pac-ret so we should honour that. Whereas standard would only enable what is supported for that system. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Tue Feb 1 12:15:11 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 1 Feb 2022 12:15:11 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 03:35:13 GMT, Yi-Fan Tsai wrote: > 8251505: Use of types in compiler shared code should be consistent. Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/utilities/elfFile.cpp line 94: > 92: } > 93: > 94: bool FileReader::set_position(int64_t offset) { You introduce a bug here. `fseek` declaration: int fseek ( FILE * stream, long int offset, int origin ); `fseek` will read only 32 bits of `offset` if `sizeof(long)==32`. ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From dholmes at openjdk.java.net Tue Feb 1 12:45:12 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 1 Feb 2022 12:45:12 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 11:05:46 GMT, Alan Hayward wrote: >> src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 429: >> >>> 427: #else >>> 428: warning("UseROPProtection specified, but not supported in the VM."); >>> 429: #endif >> >> If we issue these warnings should `_rop_protection` still be set true? > > As per this conversation: https://github.com/openjdk/jdk/pull/6334#discussion_r791722292 > > The idea was, the user is explicitly asking for asking for pac-ret so we should honour that. Whereas standard would only enable what is supported for that system. But we can't honour that because it is not supported. Further, the suggestion in the referenced discussion seemed to be based on the assumption that doing so would be harmless because it is NOP based, but you have indicated that may not be the case and so it may actually lead to a crash! ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From mdoerr at openjdk.java.net Tue Feb 1 13:30:40 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 1 Feb 2022 13:30:40 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 Message-ID: PPC64 implementation of JDK-8277180. ------------- Commit messages: - 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 Changes: https://git.openjdk.java.net/jdk/pull/7305/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7305&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281043 Stats: 21 lines in 1 file changed: 8 ins; 3 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/7305.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7305/head:pull/7305 PR: https://git.openjdk.java.net/jdk/pull/7305 From duke at openjdk.java.net Tue Feb 1 13:47:12 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 1 Feb 2022 13:47:12 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 12:42:26 GMT, David Holmes wrote: >> As per this conversation: https://github.com/openjdk/jdk/pull/6334#discussion_r791722292 >> >> The idea was, the user is explicitly asking for asking for pac-ret so we should honour that. Whereas standard would only enable what is supported for that system. > > But we can't honour that because it is not supported. Further, the suggestion in the referenced discussion seemed to be based on the assumption that doing so would be harmless because it is NOP based, but you have indicated that may not be the case and so it may actually lead to a crash! Before I change anything - @theRealAph you had an opinion here too... ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From hseigel at openjdk.java.net Tue Feb 1 14:13:51 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 1 Feb 2022 14:13:51 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: > Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. > > This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: changes to address some review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7248/files - new: https://git.openjdk.java.net/jdk/pull/7248/files/8cc29ebe..ca2097e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=00-01 Stats: 49 lines in 8 files changed: 3 ins; 25 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/7248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7248/head:pull/7248 PR: https://git.openjdk.java.net/jdk/pull/7248 From hseigel at openjdk.java.net Tue Feb 1 14:13:57 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 1 Feb 2022 14:13:57 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 04:55:48 GMT, David Holmes wrote: >> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: >> >> changes to address some review comments > > src/hotspot/share/runtime/os.hpp line 533: > >> 531: // platforms that support such things. This calls shutdown() and then aborts. >> 532: static void abort(bool dump_core, void *siginfo, const void *context); >> 533: static void abort(bool dump_core); > > I don't understand why the change to the default arg was needed. There should be no conflict between `os::abort()` and `::abort()`. I reverted the abort() changes. Thanks for correcting this. > src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 97: > >> 95: FORBID_C_FUNCTION(FILE* fopen(const char*, const char*), "use os::fopen"); >> 96: FORBID_C_FUNCTION(int fsync(int), "use os::fsync"); >> 97: FORBID_C_FUNCTION(int ftruncate(int, off_t), "use os::ftruncate"); > > Shouldn't this be ftruncate for BSD and ftruncate64 for other Posix (not sure what Windows has)? Platform agnostic code would call ftruncate(), not ftruncate64(). So I think this is correct as is. > src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 99: > >> 97: FORBID_C_FUNCTION(int ftruncate(int, off_t), "use os::ftruncate"); >> 98: FORBID_C_FUNCTION(void funlockfile(FILE *), "use os::funlockfile"); >> 99: FORBID_C_FUNCTION(off_t lseek(int, off_t, int), "use os::lseek"); > > Similarly there should be a lseek64 definition too. Like ftruncate(), platform agnostic code would call lseek(), not lseek64(). So I think this is correct as is. > src/hotspot/share/utilities/ostream.cpp line 615: > >> 613: >> 614: PRAGMA_DIAG_PUSH >> 615: PRAGMA_PERMIT_FORBIDDEN_C_FUNCTION(write); > > Why do we not call os::write here? fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From hseigel at openjdk.java.net Tue Feb 1 14:13:52 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 1 Feb 2022 14:13:52 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability In-Reply-To: References: Message-ID: On Thu, 27 Jan 2022 19:18:10 GMT, Harold Seigel wrote: > Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. > > This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. > > Thanks, Harold The second commit contains minor changes in response to review comments. The changes include removing unneeded "os::" from os_*.cpp files, reverting all changes to os_aix.cpp, reverting changes to abort(), removing "#include " and related funcations from compilerWarnings_gcc.hpp, and changing "::write()" to "os::write()" in ostream.cpp. This update does not address bigger issues such as structure and placement concerns and whether or not to do this change at all. ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From hseigel at openjdk.java.net Tue Feb 1 14:13:53 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 1 Feb 2022 14:13:53 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 19:32:20 GMT, Kim Barrett wrote: >> src/hotspot/os/aix/os_aix.cpp line 2499: >> >>> 2497: struct dirent *ptr; >>> 2498: >>> 2499: dir = os::opendir(path); >> >> Just to clarify, as we are in the scope of the os class both `opendir` and `os::opendir` are the same thing here - and similarly for other code in the os class - right? > > Yes, that's correct. So an unqualified opendir here should not trigger a forbidden warning. I removed "os:::" from the os class files. ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From hseigel at openjdk.java.net Tue Feb 1 14:13:59 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 1 Feb 2022 14:13:59 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: On Sat, 29 Jan 2022 07:09:18 GMT, Thomas Stuefe wrote: >> We only compile AIX with xclang these days. I don't know how our "xlc" compiler platform mechanism interacts with our "gcc" (which is really both gcc and clang) compiler platform, or if it interacts, or if it should. But none of that matters for the dirent.h problem. The problem there is that it's a system header, irrespective of what compiler is being used, and it has this problem. So whether we need this NULL cruft here depends on whether AIX with xclang uses this file or not. One option would be to just not deal with the dirent stuff yet, saving that for a followup focused on that problem. > > Sorry, I'm confused. We build AIX with xlc. I don't believe we even include this file on AIX. How does this help AIX? I removed the changes for the dirent functions and removed the above code. I also reverted all changes to os_aix.cpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From hseigel at openjdk.java.net Tue Feb 1 14:14:00 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 1 Feb 2022 14:14:00 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 19:33:21 GMT, Kim Barrett wrote: >> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: >> >> changes to address some review comments > > src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 114: > >> 112: >> 113: #define FORBID_C_FUNCTION(signature, alternative) >> 114: #define PRAGMA_PERMIT_FORBIDDEN_C_FUNCTION(name) > > These aren't needed. The default empty definitions in compilerWarnings.hpp cover this case. fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From chagedorn at openjdk.java.net Tue Feb 1 14:56:10 2022 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 1 Feb 2022 14:56:10 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 22:12:30 GMT, David Holmes wrote: > Hi Christian, > > Sorry for the delay in coming back to this, I wanted to see what other feedback arose. No problem, thanks for your feedback David! > > > That's a valid concern. I've also asked myself this question when I had initially started using some assertions. We should not crash again during error reporting. I've therefore tried to be as conservative as possible and added bailouts instead, also in loops when reading data. But of course, this is just a best effort and by no means a guarantee to be safe (especially in terms of crashes). What could be alternatives to make this better? > > > > > > If the parsing code turns out to be very problematic in a signal handling context, then we could disable it in that context. So we really want to try and do a lot of testing by throwing random signals at the VM and see what breaks. > > Source information in hs-err file stacks can be tremendously useful. Lets try the retry-callstack-dumping without features idea in case of a secondary crash, outlined above, first. Should we still handle that in a separate RFE later or should this go along with this patch/prerequisite? What do you think? > > > > Secondly, on the same issue the use of unified logging within this code seems even more likely to be problematic - I'm not aware of us currently using UL during error reporting. It may work in basic usecases but if it triggers logfile rotation or other more complex actions what then? > > > > > > > > > I haven't thought about this before. To be honest, I think UL printing of the `dwarf` tag is only useful during development when adding something new to the parser or when debugging. I don't see much value of these messages otherwise - even less for a Java user. As a first step, I could change the logs from `log_X()` to `log_develop_X()` but that just shifts the problem to non-product builds. Another option (or additional thing) could be to guard the log messages with a new develop flag that's disabled by default. By setting it for development, we accept that it might be unsafe which should be fine. > > > > > > I think changing the logging to develop only is a reasonable step. I don't see logging of crash handling / error reporting as generally useful for the end user. > > I think the right way to go longterm would be to give us a minimalistic safe logging API for these cases (signal handling, pre-initialization) or make UL safe to use always. That would be ideal if UL usage could be made safe in the future for these cases. But as for now, I will start by changing the logging to develop to limit UL usage to debug builds only which does not affect end users anymore. ------------- PR: https://git.openjdk.java.net/jdk/pull/7126 From zgu at openjdk.java.net Tue Feb 1 14:58:13 2022 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 1 Feb 2022 14:58:13 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 08:36:42 GMT, Thomas Stuefe wrote: > JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. > > On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: > > > (gdb) call pp(0x7ffff010b030) > > "Executing pp" > > Thread 2 "java" received signal SIGSEGV, Segmentation fault. > 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 > 180 assert(addr != NULL, "Invalid address"); > > > On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: > > > if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { > > > This patch: > - fixes uses of SafeFetch > - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 > - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) > > Tests: > - I tested manually that pp works with different levels of NMT (Linux x64) > - GHAs in process Changes requested by zgu (Reviewer). src/hotspot/share/services/virtualMemoryTracker.cpp line 699: > 697: walk_virtual_memory(&walker); > 698: return walker.region(); > 699: } Snapshot the region is for avoiding race pointed out by Ioi in code review, because other thread might release the region after walk. src/hotspot/share/utilities/debug.cpp line 505: > 503: // is handled quietly by the VM, but it will trip up the debugger. gdb will catch the signal and disable > 504: // the pp() command for further use. > 505: // In order to avoid that, before invoking pp(), switch off SIGSEGV handling with "handle SIGSEGV nostop". Ah, my .gdbinit has `handle SIGSEGV nostop noprint pass` ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From mdoerr at openjdk.java.net Tue Feb 1 15:16:50 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 1 Feb 2022 15:16:50 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v2] In-Reply-To: References: Message-ID: > PPC64 implementation of JDK-8277180. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Shorter and better redable recursions increment sequence. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7305/files - new: https://git.openjdk.java.net/jdk/pull/7305/files/2428acce..1eec2373 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7305&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7305&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7305.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7305/head:pull/7305 PR: https://git.openjdk.java.net/jdk/pull/7305 From prappo at openjdk.java.net Tue Feb 1 16:31:38 2022 From: prappo at openjdk.java.net (Pavel Rappo) Date: Tue, 1 Feb 2022 16:31:38 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS Message-ID: While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/7311/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7311&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281057 Stats: 18 lines in 5 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/7311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7311/head:pull/7311 PR: https://git.openjdk.java.net/jdk/pull/7311 From mdoerr at openjdk.java.net Tue Feb 1 17:30:24 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 1 Feb 2022 17:30:24 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames Message-ID: s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. ------------- Commit messages: - 8281061: [s390] JFR runs into assertions while validating interpreter frames Changes: https://git.openjdk.java.net/jdk/pull/7312/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281061 Stats: 12 lines in 2 files changed: 2 ins; 3 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/7312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7312/head:pull/7312 PR: https://git.openjdk.java.net/jdk/pull/7312 From darcy at openjdk.java.net Tue Feb 1 17:34:10 2022 From: darcy at openjdk.java.net (Joe Darcy) Date: Tue, 1 Feb 2022 17:34:10 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: <64qmLKz3G6wYRdnA0Y65Fho7oUZqint8EKD-a1GXkJY=.4536f347-7bcd-4d74-b6e3-01441cf41f2c@github.com> On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. Marked as reviewed by darcy (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From iris at openjdk.java.net Tue Feb 1 17:45:12 2022 From: iris at openjdk.java.net (Iris Clark) Date: Tue, 1 Feb 2022 17:45:12 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. Marked as reviewed by iris (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From thartmann at openjdk.java.net Tue Feb 1 17:45:23 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 1 Feb 2022 17:45:23 GMT Subject: [jdk18] Integrated: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 08:20:29 GMT, Tobias Hartmann wrote: > Backport of [JDK-8278871](https://bugs.openjdk.java.net/browse/JDK-8278871). Applies cleanly. Fix request is pending. This pull request has now been integrated. Changeset: 2531c332 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk18/commit/2531c332f89c5faedf71ce1737373581c9abf905 Stats: 21 lines in 5 files changed: 9 ins; 4 del; 8 mod 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob Backport-of: 6f0e8da6d3bef340299e48977d5e17d05eabe682 ------------- PR: https://git.openjdk.java.net/jdk18/pull/114 From duke at openjdk.java.net Tue Feb 1 18:29:36 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Tue, 1 Feb 2022 18:29:36 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v2] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Fix a regression ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/b08834de..5c0e4349 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From duke at openjdk.java.net Tue Feb 1 18:32:44 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Tue, 1 Feb 2022 18:32:44 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v3] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Fix a regression ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/5c0e4349..d775761d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From aph at openjdk.java.net Tue Feb 1 18:38:21 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 1 Feb 2022 18:38:21 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 22:25:38 GMT, David Holmes wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix popframe failures > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 417: > >> 415: // Enable PAC if this code has been built with branch-protection and the CPU/OS supports it. >> 416: #ifdef __ARM_FEATURE_PAC_DEFAULT >> 417: if (_features & CPU_PACA) { > > Style nit: no implicit booleans - expand as "if ( A & B != 0)" Oh yuck, really? This is my punishment for not paying attention to the style guide dicussions. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Tue Feb 1 18:38:21 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 1 Feb 2022 18:38:21 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 12:42:26 GMT, David Holmes wrote: >> As per this conversation: https://github.com/openjdk/jdk/pull/6334#discussion_r791722292 >> >> The idea was, the user is explicitly asking for asking for pac-ret so we should honour that. Whereas standard would only enable what is supported for that system. > > But we can't honour that because it is not supported. Further, the suggestion in the referenced discussion seemed to be based on the assumption that doing so would be harmless because it is NOP based, but you have indicated that may not be the case and so it may actually lead to a crash! Given that the implementation has now changed so much that it's no longer NOP based, I'll go with @dholmes-ora . One other thing, though: it might be better to say here "but this VM was built without ROP-protection support." That's more informative, IMO. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Tue Feb 1 18:38:49 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Tue, 1 Feb 2022 18:38:49 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v4] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Remove unintentional formatting ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/d775761d..05597245 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From shade at openjdk.java.net Tue Feb 1 20:58:14 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 1 Feb 2022 20:58:14 GMT Subject: RFR: 8280867: Cpuid1Ecx feature parsing is incorrect for AMD CPUs In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 11:26:29 GMT, Aleksey Shipilev wrote: > See discussion in the bug. AFAICS, the fix is to "just" shift the flags by one to match both Intel and AMD specs. I believe this is not a serious bug, because adjacent bits in AMD case are set on modern chips, and Intel detection code only uses `lzcnt` and `prefetchw` out of these flags, both with Intel-specific hacks that are dropped now. > > Additional testing: > - [x] Linux x86_64 fastdebug on TR 3970X (Zen 2) > - [x] Linux x86_64 fastdebug on i5-11500 (Rocket Lake) > - [x] Eyeballing `-Xlog:os+cpu` on TR 3970X (Zen 2) -- no change in detected flags > - [x] Eyeballing `-Xlog:os+cpu` on i5-11500 (Rocket Lake) -- no change in detected flags Right, thanks for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/7287 From shade at openjdk.java.net Tue Feb 1 20:58:14 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 1 Feb 2022 20:58:14 GMT Subject: Integrated: 8280867: Cpuid1Ecx feature parsing is incorrect for AMD CPUs In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 11:26:29 GMT, Aleksey Shipilev wrote: > See discussion in the bug. AFAICS, the fix is to "just" shift the flags by one to match both Intel and AMD specs. I believe this is not a serious bug, because adjacent bits in AMD case are set on modern chips, and Intel detection code only uses `lzcnt` and `prefetchw` out of these flags, both with Intel-specific hacks that are dropped now. > > Additional testing: > - [x] Linux x86_64 fastdebug on TR 3970X (Zen 2) > - [x] Linux x86_64 fastdebug on i5-11500 (Rocket Lake) > - [x] Eyeballing `-Xlog:os+cpu` on TR 3970X (Zen 2) -- no change in detected flags > - [x] Eyeballing `-Xlog:os+cpu` on i5-11500 (Rocket Lake) -- no change in detected flags This pull request has now been integrated. Changeset: a18beb47 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a18beb4797a1ca6fc6b31e997be48b2bd91c6ac0 Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod 8280867: Cpuid1Ecx feature parsing is incorrect for AMD CPUs Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/7287 From phh at openjdk.java.net Tue Feb 1 21:09:11 2022 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 1 Feb 2022 21:09:11 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v4] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 18:38:49 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Remove unintentional formatting Why not use 'jlong' everywhere instead of int64_t? You've got a mix of them (well, compileBroker.* uses jlong), better to be consistent.. Ditto INT64_FORMAT and JLONG_FORMAT. You'd also avoid declaring "declare_integer_type(int64_t)" in vmStructs.cpp. In gc_globals.hpp, use of intx is likely intentional, so I'd leave it alone. intx resolves to intptr_t (see globalDefinitions.hpp), which is 32 bits on 32-bit systems and 64 bits on 64-bit ones, which is what you want. ------------- Changes requested by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7294 From duke at openjdk.java.net Tue Feb 1 22:56:54 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Tue, 1 Feb 2022 22:56:54 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v5] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: temp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/05597245..22fbe08a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=03-04 Stats: 26 lines in 7 files changed: 0 ins; 1 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From duke at openjdk.java.net Tue Feb 1 23:08:48 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Tue, 1 Feb 2022 23:08:48 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v6] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Use jlong instead of int64_t ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/22fbe08a..bca7b783 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=04-05 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From dholmes at openjdk.java.net Tue Feb 1 23:15:11 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 1 Feb 2022 23:15:11 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v6] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 23:08:48 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Use jlong instead of int64_t Do not use jlong everywhere. We should only be using jlong where we have values that will interact with Java code and so have to be jlong to be compatible. Otherwise for a 64-bit type we should be using int64_t, or uint64_t as appropriate. For a type that should be 32-bit or 64-bit depending on the environment we should use intx, intptr_t or size_t depending on the nature of the variable. Also do not force-push changes, just commit any changes as normal and push them. When the change is integrated the skara tooling will flatten things into one clean commit. If you force-push you mess up the PR review process. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From dholmes at openjdk.java.net Wed Feb 2 00:13:12 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 2 Feb 2022 00:13:12 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 14:08:40 GMT, Harold Seigel wrote: >> src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 97: >> >>> 95: FORBID_C_FUNCTION(FILE* fopen(const char*, const char*), "use os::fopen"); >>> 96: FORBID_C_FUNCTION(int fsync(int), "use os::fsync"); >>> 97: FORBID_C_FUNCTION(int ftruncate(int, off_t), "use os::ftruncate"); >> >> Shouldn't this be ftruncate for BSD and ftruncate64 for other Posix (not sure what Windows has)? > > Platform agnostic code would call ftruncate(), not ftruncate64(). So I think this is correct as is. You need to enable the warning for the function that we would use, which we are not supposed to use and that would be `ftruncate64` on Linux. >> src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 99: >> >>> 97: FORBID_C_FUNCTION(int ftruncate(int, off_t), "use os::ftruncate"); >>> 98: FORBID_C_FUNCTION(void funlockfile(FILE *), "use os::funlockfile"); >>> 99: FORBID_C_FUNCTION(off_t lseek(int, off_t, int), "use os::lseek"); >> >> Similarly there should be a lseek64 definition too. > > Like ftruncate(), platform agnostic code would call lseek(), not lseek64(). So I think this is correct as is. I disagree - you are not enabling the warnings for all the functions that would be used. ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From duke at openjdk.java.net Wed Feb 2 01:52:46 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Wed, 2 Feb 2022 01:52:46 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Revert "Use jlong instead of int64_t" ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/bca7b783..01f3b1f2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=05-06 Stats: 23 lines in 4 files changed: 1 ins; 0 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From dlong at openjdk.java.net Wed Feb 2 03:39:07 2022 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 2 Feb 2022 03:39:07 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 01:52:46 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use jlong instead of int64_t" How about making the traversal mark type unsigned? ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From stuefe at openjdk.java.net Wed Feb 2 08:22:39 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 2 Feb 2022 08:22:39 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v2] In-Reply-To: References: Message-ID: > JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. > > On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: > > > (gdb) call pp(0x7ffff010b030) > > "Executing pp" > > Thread 2 "java" received signal SIGSEGV, Segmentation fault. > 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 > 180 assert(addr != NULL, "Invalid address"); > > > On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: > > > if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { > > > This patch: > - fixes uses of SafeFetch > - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 > - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) > > Tests: > - I tested manually that pp works with different levels of NMT (Linux x64) > - GHAs in process Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Zhengyus remarks ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7297/files - new: https://git.openjdk.java.net/jdk/pull/7297/files/6ad3018a..10dc66ed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7297&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7297&range=00-01 Stats: 103 lines in 5 files changed: 56 ins; 33 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/7297.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7297/head:pull/7297 PR: https://git.openjdk.java.net/jdk/pull/7297 From stuefe at openjdk.java.net Wed Feb 2 08:27:14 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 2 Feb 2022 08:27:14 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v2] In-Reply-To: References: Message-ID: <02M2JuFVSa-pZX75zgzYLEpBO7A_kihJZeSws5lrMCc=.e0f89c55-6f74-42b3-9307-d00449cd0730@github.com> On Tue, 1 Feb 2022 14:55:10 GMT, Zhengyu Gu wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Zhengyus remarks > > Changes requested by zgu (Reviewer). @zhengyu123 thanks for looking at this. I rewrote the patch and made printing inline within lock protection. For symmetry, I also moved the printing code on the malloc side to the malloc tracker. Added comments. Note that I think @iklam was overcautious in the original discussion. We already risk signals in the debugging session by using SafeFetch to read the malloc header. Using a reference to a potentially dead ReservedMemoryRegion runs an additional - very low - risk of signals. So, if we really want to be cautious, we should not print out NMT information at all in pp. But I think NMT is very useful and worth the risk. All we risk is a slightly miffed debugger. ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From stuefe at openjdk.java.net Wed Feb 2 08:30:08 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 2 Feb 2022 08:30:08 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v2] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 14:07:49 GMT, Harold Seigel wrote: >> Sorry, I'm confused. We build AIX with xlc. I don't believe we even include this file on AIX. How does this help AIX? > > I removed the changes for the dirent functions and removed the above code. I also reverted all changes to os_aix.cpp. Thank you @hseigel ! ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From duke at openjdk.java.net Wed Feb 2 09:28:48 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 2 Feb 2022 09:28:48 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v15] In-Reply-To: References: Message-ID: <3RcA40D5i_vhwGr49mD78aBNsaLjV_d13kEsTuL9S5I=.1d9d6d4a-a055-47b1-ae6f-41c26d1fae76@github.com> > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Fix up nits ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/0b476542..b7925614 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=13-14 Stats: 15 lines in 6 files changed: 2 ins; 3 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Wed Feb 2 09:32:14 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 2 Feb 2022 09:32:14 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 18:33:28 GMT, Andrew Haley wrote: >> But we can't honour that because it is not supported. Further, the suggestion in the referenced discussion seemed to be based on the assumption that doing so would be harmless because it is NOP based, but you have indicated that may not be the case and so it may actually lead to a crash! > > Given that the implementation has now changed so much that it's no longer NOP based, I'll go with @dholmes-ora . > One other thing, though: it might be better to say here "but this VM was built without ROP-protection support." That's more informative, IMO. Ok, I'll fix up as suggested. The beginning part of that message needs fixing too - UseROPProtection is no longer the name of the flag. I'll switch to: "ROP-protection specified, but this VM was built without ROP-protection support." ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Wed Feb 2 09:37:15 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 2 Feb 2022 09:37:15 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 09:29:21 GMT, Alan Hayward wrote: >> Given that the implementation has now changed so much that it's no longer NOP based, I'll go with @dholmes-ora . >> One other thing, though: it might be better to say here "but this VM was built without ROP-protection support." That's more informative, IMO. > > Ok, I'll fix up as suggested. > > The beginning part of that message needs fixing too - UseROPProtection is no longer the name of the flag. I'll switch to: > "ROP-protection specified, but this VM was built without ROP-protection support." And this change will keep ROP protection enabled if we fall into the "this VM was built without ROP-protection support.". In that case we'll be protecting generated code, but the VM itself won't be protected. This will run without crashing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From prappo at openjdk.java.net Wed Feb 2 10:01:06 2022 From: prappo at openjdk.java.net (Pavel Rappo) Date: Wed, 2 Feb 2022 10:01:06 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: <1V1_jhMx07iBPq8rzhWP2pwNkb8lBNpeqP3jzA06lf0=.79432eb1-3bf4-4d28-acb7-f5c07bf1f0c4@github.com> On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. I would appreciate it if serviceability and hotspot could review this PR too. ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From aph at openjdk.java.net Wed Feb 2 10:22:13 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 2 Feb 2022 10:22:13 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 09:34:20 GMT, Alan Hayward wrote: > And this change will keep ROP protection enabled if we fall into the "this VM was built without ROP-protection support.". In that case we'll be protecting generated code, but the VM itself won't be protected. This will run without crashing. That's perfect. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From dholmes at openjdk.java.net Wed Feb 2 12:08:06 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 2 Feb 2022 12:08:06 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. src/jdk.compiler/share/classes/com/sun/tools/javac/code/Symbol.java line 670: > 668: * modifier is ignored for this test. > 669: * > 670: * See JLS 8.4.8.1 (without transitivity) and 8.4.8.4 Any idea what the "(without transitivity)" is referring to here and elsewhere? ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From dholmes at openjdk.java.net Wed Feb 2 12:14:04 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 2 Feb 2022 12:14:04 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. Hi Pavel, All the section number changes look good and accurate. I have one query above, and also spotted one existing comment that is not correct. Thanks, David src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Check.java line 1793: > 1791: } > 1792: > 1793: // Error if static method overrides instance method (JLS 8.4.8.2). "overrides" should be "hides" ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7311 From prappo at openjdk.java.net Wed Feb 2 12:34:05 2022 From: prappo at openjdk.java.net (Pavel Rappo) Date: Wed, 2 Feb 2022 12:34:05 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 12:04:29 GMT, David Holmes wrote: >> While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. >> >> Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. > > src/jdk.compiler/share/classes/com/sun/tools/javac/code/Symbol.java line 670: > >> 668: * modifier is ignored for this test. >> 669: * >> 670: * See JLS 8.4.8.1 (without transitivity) and 8.4.8.4 > > Any idea what the "(without transitivity)" is referring to here and elsewhere? My guess is that "transitivity" here refers to the subclass relationship being the transitive closure of the direct subclass relationship. Could it also be that the "quirk" paragraph starting at com/sun/tools/javac/code/Symbol.java:2057 is relevant here? @mcimadamore? ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From prappo at openjdk.java.net Wed Feb 2 12:46:05 2022 From: prappo at openjdk.java.net (Pavel Rappo) Date: Wed, 2 Feb 2022 12:46:05 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 12:06:39 GMT, David Holmes wrote: >> While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. >> >> Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. > > src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Check.java line 1793: > >> 1791: } >> 1792: >> 1793: // Error if static method overrides instance method (JLS 8.4.8.2). > > "overrides" should be "hides" Although you seem to be correct, the error messages and the code around operate using the term "override": // Error if static method overrides instance method (JLS 8.4.8.2). if ((m.flags() & STATIC) != 0 && (other.flags() & STATIC) == 0) { log.error(TreeInfo.diagnosticPositionFor(m, tree), Errors.OverrideStatic(cannotOverride(m, other))); m.flags_field |= BAD_OVERRIDE; return; } // Error if instance method overrides static or final // method (JLS 8.4.8.1). if ((other.flags() & FINAL) != 0 || (m.flags() & STATIC) == 0 && (other.flags() & STATIC) != 0) { log.error(TreeInfo.diagnosticPositionFor(m, tree), Errors.OverrideMeth(cannotOverride(m, other), asFlagSet(other.flags() & (FINAL | STATIC)))); m.flags_field |= BAD_OVERRIDE; return; } /** * compiler.err.override.static=\ * {0}\n\ * overriding method is static */ public static Error OverrideStatic(Fragment arg0) { return new Error("compiler", "override.static", arg0); } Compiler folk, what do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From zgu at openjdk.java.net Wed Feb 2 13:34:09 2022 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 2 Feb 2022 13:34:09 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v2] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 08:22:39 GMT, Thomas Stuefe wrote: >> JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. >> >> On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: >> >> >> (gdb) call pp(0x7ffff010b030) >> >> "Executing pp" >> >> Thread 2 "java" received signal SIGSEGV, Segmentation fault. >> 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 >> 180 assert(addr != NULL, "Invalid address"); >> >> >> On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: >> >> >> if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { >> >> >> This patch: >> - fixes uses of SafeFetch >> - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 >> - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) >> >> Tests: >> - I tested manually that pp works with different levels of NMT (Linux x64) >> - GHAs in process > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Zhengyus remarks Changes requested by zgu (Reviewer). src/hotspot/share/services/mallocTracker.cpp line 301: > 299: bool MallocTracker::print_pointer_information(const void* p, outputStream* st) { > 300: assert(MemTracker::enabled(), "NMT must be enabled"); > 301: if (CanUseSafeFetchN() && os::is_readable_pointer(p)) { `os::is_readable_pointer()` uses `CanUseSafeFetch32()`, you may want to check `CanUseSafeFetch32()` instead of `CanUseSafeFetchN()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From david.holmes at oracle.com Wed Feb 2 13:39:38 2022 From: david.holmes at oracle.com (David Holmes) Date: Wed, 2 Feb 2022 23:39:38 +1000 Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On 2/02/2022 10:46 pm, Pavel Rappo wrote: > On Wed, 2 Feb 2022 12:06:39 GMT, David Holmes wrote: > >>> While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. >>> >>> Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. >> >> src/jdk.compiler/share/classes/com/sun/tools/javac/comp/Check.java line 1793: >> >>> 1791: } >>> 1792: >>> 1793: // Error if static method overrides instance method (JLS 8.4.8.2). >> >> "overrides" should be "hides" > > Although you seem to be correct, the error messages and the code around operate using the term "override": Ah yes, I can see now that "overrides" is (incorrectly) used all through this code and even in the error messages. It is a subtle distinction. Cheers, David ----- > // Error if static method overrides instance method (JLS 8.4.8.2). > if ((m.flags() & STATIC) != 0 && > (other.flags() & STATIC) == 0) { > log.error(TreeInfo.diagnosticPositionFor(m, tree), > Errors.OverrideStatic(cannotOverride(m, other))); > m.flags_field |= BAD_OVERRIDE; > return; > } > > // Error if instance method overrides static or final > // method (JLS 8.4.8.1). > if ((other.flags() & FINAL) != 0 || > (m.flags() & STATIC) == 0 && > (other.flags() & STATIC) != 0) { > log.error(TreeInfo.diagnosticPositionFor(m, tree), > Errors.OverrideMeth(cannotOverride(m, other), > asFlagSet(other.flags() & (FINAL | STATIC)))); > m.flags_field |= BAD_OVERRIDE; > return; > } > > > /** > * compiler.err.override.static=\ > * {0}\n\ > * overriding method is static > */ > public static Error OverrideStatic(Fragment arg0) { > return new Error("compiler", "override.static", arg0); > } > > Compiler folk, what do you think? > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/7311 From duke at openjdk.java.net Wed Feb 2 13:55:42 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 2 Feb 2022 13:55:42 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v16] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Change pac-ret defaults on non PAC machines ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/b7925614..78da1bd0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=14-15 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From mcimadamore at openjdk.java.net Wed Feb 2 14:41:11 2022 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Wed, 2 Feb 2022 14:41:11 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 12:31:04 GMT, Pavel Rappo wrote: >> src/jdk.compiler/share/classes/com/sun/tools/javac/code/Symbol.java line 670: >> >>> 668: * modifier is ignored for this test. >>> 669: * >>> 670: * See JLS 8.4.8.1 (without transitivity) and 8.4.8.4 >> >> Any idea what the "(without transitivity)" is referring to here and elsewhere? > > My guess is that "transitivity" here refers to the subclass relationship being the transitive closure of the direct subclass relationship. Could it also be that the "quirk" paragraph starting at com/sun/tools/javac/code/Symbol.java:2057 is relevant here? @mcimadamore? First, this class contains many references to 8.4.6.x - which should really be 8.4.8.x - not just this one. I'm not 100% sure about the "without transitivity" comment, but if I had to guess I'd say that it refers to the fact that the checks described in 8.4.8.3 are missing from this routine. More specifically, this section: It is a compile-time error if a class or interface C has a member method m1 and there exists a method m2 declared in C or a superclass or superinterface of C, A, such that all of the following are true: * m1 and m2 have the same name. * m2 is accessible (?6.6) from C. * The signature of m1 is not a subsignature (?8.4.2) of the signature of m2 as a member of the supertype of C that names A. * The declared signature of m1 or some method m1 overrides (directly or indirectly) has the same erasure as the declared signature of m2 or some method m2 overrides (directly or indirectly). <---------- As you can see, the last bullet introduces some sort of global requirement across the inheritance chain; this constraint was necessary after Java 5, as generics require the introduction of bridge methods, and it is possible, in some extreme cases, for a subclass to override (accidentally) a bridge method. The JLS doesn't say the word "bridge method" anywhere, but this is what this check morally does. Now, in an early version of the Java compiler (5 and 6, IIRC), we used to check for clashes with bridge methods at code generation time. So, the checks in the compiler frontend, such as Symbol::overrides did not really have to concerb with that expensive side of the override check. But, as the implementation matured, it became clearer that (a) the code-generation clash check was not enough to detect all problematic cases and (b) detecting clashes at code generation time was *too late*, especially for clients of the compiler API which might only run the "analyze" step. For these reasons, staring from Java 7, the frontend also has a more expensive check which supports 8.4.8.3 in full (Check::checkOverrideClashes). Of course, not being the author of that comment, this is only my best guess as to what that could mean :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From prappo at openjdk.java.net Wed Feb 2 15:44:11 2022 From: prappo at openjdk.java.net (Pavel Rappo) Date: Wed, 2 Feb 2022 15:44:11 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 14:37:39 GMT, Maurizio Cimadamore wrote: >> My guess is that "transitivity" here refers to the subclass relationship being the transitive closure of the direct subclass relationship. Could it also be that the "quirk" paragraph starting at com/sun/tools/javac/code/Symbol.java:2057 is relevant here? @mcimadamore? > > First, this class contains many references to 8.4.6.x - which should really be 8.4.8.x - not just this one. > > I'm not 100% sure about the "without transitivity" comment, but if I had to guess I'd say that it refers to the fact that the checks described in 8.4.8.3 are missing from this routine. More specifically, this section: > > > It is a compile-time error if a class or interface C has a member method m1 and there exists a method m2 declared in C or a superclass or superinterface of C, A, such that all of the following are true: > * m1 and m2 have the same name. > * m2 is accessible (?6.6) from C. > * The signature of m1 is not a subsignature (?8.4.2) of the signature of m2 as a member of the supertype of C that names A. > * The declared signature of m1 or some method m1 overrides (directly or indirectly) has the same erasure as the declared signature of m2 or some method m2 overrides (directly or indirectly). <---------- > > As you can see, the last bullet introduces some sort of global requirement across the inheritance chain; this constraint was necessary after Java 5, as generics require the introduction of bridge methods, and it is possible, in some extreme cases, for a subclass to override (accidentally) a bridge method. The JLS doesn't say the word "bridge method" anywhere, but this is what this check morally does. > > Now, in an early version of the Java compiler (5 and 6, IIRC), we used to check for clashes with bridge methods at code generation time. So, the checks in the compiler frontend, such as Symbol::overrides did not really have to concerb with that expensive side of the override check. > > But, as the implementation matured, it became clearer that (a) the code-generation clash check was not enough to detect all problematic cases and (b) detecting clashes at code generation time was *too late*, especially for clients of the compiler API which might only run the "analyze" step. For these reasons, staring from Java 7, the frontend also has a more expensive check which supports 8.4.8.3 in full (Check::checkOverrideClashes). > > Of course, not being the author of that comment, this is only my best guess as to what that could mean :-) FWIW, I found a related bug: https://bugs.openjdk.java.net/browse/JDK-4362349. It might be responsible for that "(without transitivity)" caveat. ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From duke at openjdk.java.net Wed Feb 2 16:03:48 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 2 Feb 2022 16:03:48 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v17] In-Reply-To: References: Message-ID: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Update copyrights to 2022 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/78da1bd0..6255d4c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=16 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=15-16 Stats: 16 lines in 16 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From hseigel at openjdk.java.net Wed Feb 2 18:20:49 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 2 Feb 2022 18:20:49 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v3] In-Reply-To: References: Message-ID: > Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. > > This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: Add warnings for ftruncate64 and lseek64 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7248/files - new: https://git.openjdk.java.net/jdk/pull/7248/files/ca2097e4..dd1820eb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=01-02 Stats: 12 lines in 3 files changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7248/head:pull/7248 PR: https://git.openjdk.java.net/jdk/pull/7248 From phh at openjdk.java.net Wed Feb 2 20:58:02 2022 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 2 Feb 2022 20:58:02 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 01:52:46 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use jlong instead of int64_t" Hi, David, I stand corrected. Is there a document somewhere about the policy, and has anyone gone through Hotspot to remove improper use of jlong? So, belay my jlong suggestion, but now compileBroker.* should use int64_t. I think my gc_globals.hpp comment still stands. ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From phh at openjdk.java.net Wed Feb 2 21:20:10 2022 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 2 Feb 2022 21:20:10 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 01:52:46 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use jlong instead of int64_t" Marked as reviewed by phh (Reviewer). The traversal mark type is signed right now, so I'd leave it signed for this PR and file another one if we want to change it to unsigned. There are quite a few places where signed types are used for values that are intuitively unsigned. One reason I can think of to keep using signed types is that it's easy to detect overflow/wrap-around: just check the sign bit. Allows a bit of time to check for overflow/wrap-around without keeping an old value around. ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From rrich at openjdk.java.net Wed Feb 2 21:22:10 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 2 Feb 2022 21:22:10 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v2] In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 15:16:50 GMT, Martin Doerr wrote: >> PPC64 implementation of JDK-8277180. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Shorter and better redable recursions increment sequence. Hi Martin, the change looks good. Have you tested it with a quick micro benchmark? The copyright needs to be updated. Cheers, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/7305 From duke at openjdk.java.net Wed Feb 2 21:49:15 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 2 Feb 2022 21:49:15 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 01:52:46 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Use jlong instead of int64_t" Other missed places to change: jvmci/jvmciEnv.hpp: long get_long_at(JVMCIPrimitiveArray array, int index); services/memReporter.hpp: inline long diff_in_current_scale(size_t s1, size_t s2) const { services/memReporter.hpp: long amount = (long)(s1 - s2); services/memReporter.hpp: long scale = (long)_scale; services/memReporter.cpp: long amount_diff = diff_in_current_scale(current_amount, early_amount); services/memReporter.cpp: long reserved_diff = diff_in_current_scale(current_reserved, early_reserved); services/memReporter.cpp: long committed_diff = diff_in_current_scale(current_committed, early_committed); services/memReporter.cpp: long overhead_diff = diff_in_current_scale(_current_baseline.malloc_tracking_overhead(), services/memReporter.cpp: long diff_used = diff_in_current_scale(current_stats.used(), services/memReporter.cpp: long diff_waste = diff_in_current_scale(current_waste, early_waste); runtime/vmThread.cpp: long interval_ms = SafepointTracing::time_since_last_safepoint_ms(); ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From david.holmes at oracle.com Wed Feb 2 22:13:34 2022 From: david.holmes at oracle.com (David Holmes) Date: Thu, 3 Feb 2022 08:13:34 +1000 Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: On 3/02/2022 6:58 am, Paul Hohensee wrote: > On Wed, 2 Feb 2022 01:52:46 GMT, Yi-Fan Tsai wrote: > >>> 8251505: Use of types in compiler shared code should be consistent. >> >> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Use jlong instead of int64_t" > > Hi, David, I stand corrected. Is there a document somewhere about the policy, and has anyone gone through Hotspot to remove improper use of jlong? Hi Paul, Sorry no documented policy, it is just something that a number of folk have raised in "recent" years about Java type pollution (mainly jlong) in various places in the VM. People have been making the switch piecemeal as different areas get worked on. Cheers, David > So, belay my jlong suggestion, but now compileBroker.* should use int64_t. I think my gc_globals.hpp comment still stands. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/7294 From david.holmes at oracle.com Wed Feb 2 22:14:57 2022 From: david.holmes at oracle.com (David Holmes) Date: Thu, 3 Feb 2022 08:14:57 +1000 Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v7] In-Reply-To: References: Message-ID: <2c0f5a4e-6c60-100b-f0d3-a0a047926aef@oracle.com> On 3/02/2022 7:49 am, Evgeny Astigeevich wrote: > On Wed, 2 Feb 2022 01:52:46 GMT, Yi-Fan Tsai wrote: > >>> 8251505: Use of types in compiler shared code should be consistent. >> >> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Use jlong instead of int64_t" > > Other missed places to change: > > jvmci/jvmciEnv.hpp: long get_long_at(JVMCIPrimitiveArray array, int index); > services/memReporter.hpp: inline long diff_in_current_scale(size_t s1, size_t s2) const { > services/memReporter.hpp: long amount = (long)(s1 - s2); > services/memReporter.hpp: long scale = (long)_scale; > services/memReporter.cpp: long amount_diff = diff_in_current_scale(current_amount, early_amount); > services/memReporter.cpp: long reserved_diff = diff_in_current_scale(current_reserved, early_reserved); > services/memReporter.cpp: long committed_diff = diff_in_current_scale(current_committed, early_committed); > services/memReporter.cpp: long overhead_diff = diff_in_current_scale(_current_baseline.malloc_tracking_overhead(), > services/memReporter.cpp: long diff_used = diff_in_current_scale(current_stats.used(), > services/memReporter.cpp: long diff_waste = diff_in_current_scale(current_waste, early_waste); > runtime/vmThread.cpp: long interval_ms = SafepointTracing::time_since_last_safepoint_ms(); Other than jvmci these are not "compiler shared code" - other cleanups in other areas will need their own RFE. Cheers, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/7294 From cjplummer at openjdk.java.net Wed Feb 2 22:48:12 2022 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Wed, 2 Feb 2022 22:48:12 GMT Subject: RFR: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. `com/sun/jdi/ReferenceType.java` changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7311 From duke at openjdk.java.net Thu Feb 3 00:03:53 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Thu, 3 Feb 2022 00:03:53 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v8] In-Reply-To: References: Message-ID: > 8251505: Use of types in compiler shared code should be consistent. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Fix JVMCIEnv::get_long_at ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7294/files - new: https://git.openjdk.java.net/jdk/pull/7294/files/01f3b1f2..2c0eb15f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7294&range=06-07 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7294.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7294/head:pull/7294 PR: https://git.openjdk.java.net/jdk/pull/7294 From phh at openjdk.java.net Thu Feb 3 00:08:09 2022 From: phh at openjdk.java.net (Paul Hohensee) Date: Thu, 3 Feb 2022 00:08:09 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v8] In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 00:03:53 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Fix JVMCIEnv::get_long_at Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7294 From dholmes at openjdk.java.net Thu Feb 3 01:56:15 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 3 Feb 2022 01:56:15 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v14] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 10:18:38 GMT, Andrew Haley wrote: >> And this change will keep ROP protection enabled if we fall into the "this VM was built without ROP-protection support.". In that case we'll be protecting generated code, but the VM itself won't be protected. This will run without crashing. > >> And this change will keep ROP protection enabled if we fall into the "this VM was built without ROP-protection support.". In that case we'll be protecting generated code, but the VM itself won't be protected. This will run without crashing. > > That's perfect. Okay, that makes sense. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From minqi at openjdk.java.net Thu Feb 3 02:51:13 2022 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 3 Feb 2022 02:51:13 GMT Subject: RFR: 8278753: Runtime crashes with access violation during JNI_CreateJavaVM call In-Reply-To: References: Message-ID: On Tue, 25 Jan 2022 00:20:19 GMT, Yumin Qi wrote: > Please review, > When jlink with --compress=2, zip is used to compress the files while doing copy. The user case failed to load zip.dll, since zip.dll is not set in PATH. This failure is after we get NULL from GetModuleHandle("zip.dll"), then do LoadLibrary("zip.dll") will have same result. > The fix is calling load_zip_library of ClassLoader first --- if zip library already loaded just return the cached handle for following usage, if not, load zip library and cached the handle. > > Tests: tier1,4,7 in test > Manually tested user case, and checked output of jimage list for jlinked files using --compress=2. > > Thanks > Yumin Since no further update, I will integrate tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/7206 From dlong at openjdk.java.net Thu Feb 3 04:20:41 2022 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 3 Feb 2022 04:20:41 GMT Subject: RFR: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack Message-ID: Reproduced the problem with a new JASM test rather than relying on idiosyncrasies of javac. The fix is to not look at the next instruction (which might be the beginning of an unreachable block) if the current instruction doesn't fall through (like "goto"!). ------------- Commit messages: - Don't look at next bytecode if the current bytecode doesn't fall through Changes: https://git.openjdk.java.net/jdk/pull/7331/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7331&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271055 Stats: 121 lines in 3 files changed: 120 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7331.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7331/head:pull/7331 PR: https://git.openjdk.java.net/jdk/pull/7331 From stuefe at openjdk.java.net Thu Feb 3 05:37:47 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 3 Feb 2022 05:37:47 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v2] In-Reply-To: References: Message-ID: On Wed, 2 Feb 2022 13:30:26 GMT, Zhengyu Gu wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Zhengyus remarks > > src/hotspot/share/services/mallocTracker.cpp line 301: > >> 299: bool MallocTracker::print_pointer_information(const void* p, outputStream* st) { >> 300: assert(MemTracker::enabled(), "NMT must be enabled"); >> 301: if (CanUseSafeFetchN() && os::is_readable_pointer(p)) { > > `os::is_readable_pointer()` uses `CanUseSafeFetch32()`, you may want to check `CanUseSafeFetch32()` instead of `CanUseSafeFetchN()`. Good point. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From stuefe at openjdk.java.net Thu Feb 3 05:37:45 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 3 Feb 2022 05:37:45 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v3] In-Reply-To: References: Message-ID: > JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. > > On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: > > > (gdb) call pp(0x7ffff010b030) > > "Executing pp" > > Thread 2 "java" received signal SIGSEGV, Segmentation fault. > 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 > 180 assert(addr != NULL, "Invalid address"); > > > On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: > > > if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { > > > This patch: > - fixes uses of SafeFetch > - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 > - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) > > Tests: > - I tested manually that pp works with different levels of NMT (Linux x64) > - GHAs in process Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Use CanSafeFetch32 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7297/files - new: https://git.openjdk.java.net/jdk/pull/7297/files/10dc66ed..10a24978 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7297&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7297&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7297.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7297/head:pull/7297 PR: https://git.openjdk.java.net/jdk/pull/7297 From vlivanov at openjdk.java.net Thu Feb 3 06:52:11 2022 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 3 Feb 2022 06:52:11 GMT Subject: RFR: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack In-Reply-To: References: Message-ID: <-o_Q8_R9Mz3I70DbL_STe7RQfq-59Tca_Ke87HxgE8I=.e479c4bc-3166-4faf-969c-e03490479cae@github.com> On Thu, 3 Feb 2022 04:11:38 GMT, Dean Long wrote: > Reproduced the problem with a new JASM test rather than relying on idiosyncrasies of javac. > The fix is to not look at the next instruction (which might be the beginning of an unreachable block) if the current instruction doesn't fall through (like "goto"!). Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7331 From thartmann at openjdk.java.net Thu Feb 3 07:02:12 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 3 Feb 2022 07:02:12 GMT Subject: RFR: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 04:11:38 GMT, Dean Long wrote: > Reproduced the problem with a new JASM test rather than relying on idiosyncrasies of javac. > The fix is to not look at the next instruction (which might be the beginning of an unreachable block) if the current instruction doesn't fall through (like "goto"!). Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7331 From ioi.lam at oracle.com Thu Feb 3 07:30:46 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 2 Feb 2022 23:30:46 -0800 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization Message-ID: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> Please see the bug report [1] for detailed description and test cases. I'd like to have some discussion before we can decide what to do. I discovered this issue when analyzing JDK-8279484 [2]. Under Kubernetes (minikube), Runtime.availableProcessors() returns 1, despite that the fact the machine has 32 CPUs, the Kubernetes node has a single deployment, and no CPU limits were set. Specifically, I want to understand why the JDK is using CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the Java process. In cgroup, there are other ways that are designed specifically for limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why is using cpu_quota() alone not enough? Why did we choose the current approach of considering both cpu_quota() and cpu_shares()? My guess is that sometimes people don't limit the actual number of CPUs per container, but instead use CPU Shares to set the relative scheduling priority between containers. I.e., they run "docker run --cpu-shares=1234" without using the "--cpus" flag. If this is indeed the reason, I can understand the (good) intention, but the solution seems awfully insufficient. CPU Shares is a *relative* number. How much CPU is allocated to you depends on - how many other processes are actively running - what their CPU Shares are The above information can change dynamically, as other processes may be added or removed, and they can change between active and idle states. However, the JVM treats CPU Shares as an *absolute/static* number, and sets the CPU quota of the current process using this very simplistic formula. Value of /sys/fs/cgroup/cpu.shares -> cpu quota: ??? 1023 -> 1 CPU ??? 1024 -> no limit (huh??) ??? 2048 -> 2 CPUs ??? 4096 -> 4 CPUs This seems just wrong to me. There's no way you can get a "correct" result without knowing anything about other processes that are running at the same time. The net effect is when Java is running under a container, more likely that not, the JVM will limit itself to a single CPU. This seems really inefficient to me. What should we do? Thanks - Ioi [1] https://bugs.openjdk.java.net/browse/JDK-8281181 [2] https://bugs.openjdk.java.net/browse/JDK-8279484 From iklam at openjdk.java.net Thu Feb 3 07:41:07 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 3 Feb 2022 07:41:07 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v3] In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 05:37:45 GMT, Thomas Stuefe wrote: >> JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. >> >> On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: >> >> >> (gdb) call pp(0x7ffff010b030) >> >> "Executing pp" >> >> Thread 2 "java" received signal SIGSEGV, Segmentation fault. >> 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 >> 180 assert(addr != NULL, "Invalid address"); >> >> >> On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: >> >> >> if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { >> >> >> This patch: >> - fixes uses of SafeFetch >> - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 >> - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) >> >> Tests: >> - I tested manually that pp works with different levels of NMT (Linux x64) >> - GHAs in process > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use CanSafeFetch32 Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From iklam at openjdk.java.net Thu Feb 3 07:41:08 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 3 Feb 2022 07:41:08 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v3] In-Reply-To: <02M2JuFVSa-pZX75zgzYLEpBO7A_kihJZeSws5lrMCc=.e0f89c55-6f74-42b3-9307-d00449cd0730@github.com> References: <02M2JuFVSa-pZX75zgzYLEpBO7A_kihJZeSws5lrMCc=.e0f89c55-6f74-42b3-9307-d00449cd0730@github.com> Message-ID: On Wed, 2 Feb 2022 08:24:10 GMT, Thomas Stuefe wrote: > Note that I think @iklam was overcautious in the original discussion. We already risk signals in the debugging session by using SafeFetch to read the malloc header. Using a reference to a potentially dead ReservedMemoryRegion runs an additional - very low - risk of signals. So, if we really want to be cautious, we should not print out NMT information at all in pp. But I think NMT is very useful and worth the risk. All we risk is a slightly miffed debugger. I wasn't trying to make life unnecessarily difficult, and I agree that printing out the NMT info is a great idea. All I was suggesting was -- if there's a less risky way to do the printing and that's no overly complicated, we should do it that way. And the new code looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From stuefe at openjdk.java.net Thu Feb 3 07:48:08 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 3 Feb 2022 07:48:08 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v3] In-Reply-To: References: <02M2JuFVSa-pZX75zgzYLEpBO7A_kihJZeSws5lrMCc=.e0f89c55-6f74-42b3-9307-d00449cd0730@github.com> Message-ID: On Thu, 3 Feb 2022 07:37:40 GMT, Ioi Lam wrote: > > Note that I think @iklam was overcautious in the original discussion. We already risk signals in the debugging session by using SafeFetch to read the malloc header. Using a reference to a potentially dead ReservedMemoryRegion runs an additional - very low - risk of signals. So, if we really want to be cautious, we should not print out NMT information at all in pp. But I think NMT is very useful and worth the risk. All we risk is a slightly miffed debugger. > > I wasn't trying to make life unnecessarily difficult, and I agree that printing out the NMT info is a great idea. > > All I was suggesting was -- if there's a less risky way to do the printing and that's no overly complicated, we should do it that way. And the new code looks good to me. Thank you for the review! If pp() were used outside of debugging, I'd agree with your original estimate. ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From david.holmes at oracle.com Thu Feb 3 09:19:10 2022 From: david.holmes at oracle.com (David Holmes) Date: Thu, 3 Feb 2022 19:19:10 +1000 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> Message-ID: <44ce9669-71cc-0c20-ecbf-265845626820@oracle.com> Hi Ioi, For the benefit of the mailing list discussion ... On 3/02/2022 5:30 pm, Ioi Lam wrote: > Please see the bug report [1] for detailed description and test cases. > > I'd like to have some discussion before we can decide what to do. > > I discovered this issue when analyzing JDK-8279484 [2]. Under Kubernetes > (minikube), Runtime.availableProcessors() returns 1, despite that the > fact the machine has 32 CPUs, the Kubernetes node has a single > deployment, and no CPU limits were set. > > Specifically, I want to understand why the JDK is using > CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the > Java process. Because we were asked to by customers deploying in containers. > In cgroup, there are other ways that are designed specifically for > limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why is > using cpu_quota() alone not enough? Why did we choose the current > approach of considering both cpu_quota() and cpu_shares()? Because people were using both (whether that made sense or not) and so we needed a policy on what to do if both were set. > My guess is that sometimes people don't limit the actual number of CPUs > per container, but instead use CPU Shares to set the relative scheduling > priority between containers. > > I.e., they run "docker run --cpu-shares=1234" without using the "--cpus" > flag. > > If this is indeed the reason, I can understand the (good) intention, but > the solution seems awfully insufficient. > > CPU Shares is a *relative* number. How much CPU is allocated to you > depends on > > - how many other processes are actively running > - what their CPU Shares are > > The above information can change dynamically, as other processes may be > added or removed, and they can change between active and idle states. > > However, the JVM treats CPU Shares as an *absolute/static* number, and > sets the CPU quota of the current process using this very simplistic > formula. From old discussion and the code I believe the thought was that share was relative to the the per-cpu default shares of 1024. So we use that to determine the fraction of each CPU that should be assigned, and we should then use that to determine the available number of CPUs. But that isn't what we actually do - we only calculate the fraction and round it up to get the number of CPUs and that is wrong (and typically only gives 1 cpu because shares < 1024). I speculate that what was intended was to map from having an X% share of each CPU, to instead having access to X% of the total CPUs (at 100% of each). Mathematically this has some basis but it actually makes no practical sense from a throughput or response time perspective. If I'm allowed 50% of the CPU per time period to do my calculations, I want 100% of each CPU for half of the period as that potentially minimises the elapsed time till I have a result. > Value of /sys/fs/cgroup/cpu.shares -> cpu quota: > > ??? 1023 -> 1 CPU > ??? 1024 -> no limit (huh??) > ??? 2048 -> 2 CPUs > ??? 4096 -> 4 CPUs > > This seems just wrong to me. There's no way you can get a "correct" > result without knowing anything about other processes that are running > at the same time. As I said above and in the bug report I think this was an error and the intent was to then multiply by the number of actual processors. > The net effect is when Java is running under a container, more likely > that not, the JVM will limit itself to a single CPU. This seems really > inefficient to me. Yes. > What should we do? We could just adjust the calculation as I suggested. Or, given that share aka weight is meaningless without knowing the total weight in the system we could just ignore it. The app then gets access to all cpu's and it is up to the container to track actual usage and impose any limits configured. I've always thought that these cgroups mechanisms were fundamentally flawed and that if the intent was to define a resource limited environment, then the environment should report what resources were available by the normal APIs. They got this right with cpu-sets by integrating with sched_getaffinity; but for shares and quotas it has been left to the applications to try and figure out what that should mean - and that makes no sense to me. Cheers, David > Thanks > - Ioi > > [1] https://bugs.openjdk.java.net/browse/JDK-8281181 > [2] https://bugs.openjdk.java.net/browse/JDK-8279484 From mdoerr at openjdk.java.net Thu Feb 3 10:16:45 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 3 Feb 2022 10:16:45 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v3] In-Reply-To: References: Message-ID: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> > PPC64 implementation of JDK-8277180. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyright years. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7305/files - new: https://git.openjdk.java.net/jdk/pull/7305/files/1eec2373..88861201 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7305&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7305&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7305.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7305/head:pull/7305 PR: https://git.openjdk.java.net/jdk/pull/7305 From duke at openjdk.java.net Thu Feb 3 10:35:11 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 3 Feb 2022 10:35:11 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v8] In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 00:03:53 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Fix JVMCIEnv::get_long_at lgtm ------------- Marked as reviewed by eastig at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/7294 From sgehwolf at redhat.com Thu Feb 3 11:29:46 2022 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Thu, 03 Feb 2022 12:29:46 +0100 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> Message-ID: <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> Hi Ioi, On Wed, 2022-02-02 at 23:30 -0800, Ioi Lam wrote: > Please see the bug report [1] for detailed description and test cases. > > I'd like to have some discussion before we can decide what to do. > > I discovered this issue when analyzing JDK-8279484 [2]. Under Kubernetes > (minikube), Runtime.availableProcessors() returns 1, despite that the > fact the machine has 32 CPUs, the Kubernetes node has a single > deployment, and no CPU limits were set. >From looking at the bug it would be good to know why a cpu.weight value of 1 is being obverved. The default is 100. I.e. if it is really unset: $ sudo docker run --rm -v $(pwd)/jdk17:/opt/jdk:z fedora:35 /opt/jdk/bin/java -Xlog:os+container=trace --version [0.000s][trace][os,container] OSContainer::init: Initializing Container Support [0.001s][debug][os,container] Detected cgroups v2 unified hierarchy [0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max [0.001s][trace][os,container] Raw value for memory limit is: max [0.001s][trace][os,container] Memory Limit is: Unlimited [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.001s][trace][os,container] Raw value for CPU quota is: max [0.001s][trace][os,container] CPU Quota is: -1 [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max [0.001s][trace][os,container] CPU Period is: 100000 [0.001s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight [0.001s][trace][os,container] Raw value for CPU shares is: 100 [0.001s][debug][os,container] CPU Shares is: -1 [0.001s][trace][os,container] OSContainer::active_processor_count: 4 [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.001s][debug][os,container] container memory limit unlimited: -1, using host value [0.001s][debug][os,container] container memory limit unlimited: -1, using host value [0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.007s][debug][os,container] container memory limit unlimited: -1, using host value [0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 [0.022s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max [0.022s][trace][os,container] Raw value for memory limit is: max [0.022s][trace][os,container] Memory Limit is: Unlimited [0.022s][debug][os,container] container memory limit unlimited: -1, using host value openjdk 17.0.2-internal 2022-01-18 OpenJDK Runtime Environment (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u) OpenJDK 64-Bit Server VM (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u, mixed mode, sharing) > Specifically, I want to understand why the JDK is using > CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the > Java process. TLDR: Kubernetes and/or other container orchestration frameworks? That was back in the day of cgroups v1, though. > In cgroup, there are other ways that are designed specifically for > limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why is > using cpu_quota() alone not enough? Why did we choose the current > approach of considering both cpu_quota() and cpu_shares()? Kubernetes has a concept of "cpu requests" and "cpu limit". It maps (or mapped?) those values to cpu shares and cpu quota in cgroups. > My guess is that sometimes people don't limit the actual number of CPUs > per container, but instead use CPU Shares to set the relative scheduling > priority between containers. > > I.e., they run "docker run --cpu-shares=1234" without using the "--cpus" > flag. > > If this is indeed the reason, I can understand the (good) intention, but > the solution seems awfully insufficient. > > CPU Shares is a *relative* number. How much CPU is allocated to you > depends on > > - how many other processes are actively running > - what their CPU Shares are > > The above information can change dynamically, as other processes may be > added or removed, and they can change between active and idle states. > > However, the JVM treats CPU Shares as an *absolute/static* number, and > sets the CPU quota of the current process using this very simplistic > formula. > > Value of /sys/fs/cgroup/cpu.shares -> cpu quota: > > ???? 1023 -> 1 CPU > ???? 1024 -> no limit (huh??) > ???? 2048 -> 2 CPUs > ???? 4096 -> 4 CPUs > > This seems just wrong to me. There's no way you can get a "correct" > result without knowing anything about other processes that are running > at the same time. > > The net effect is when Java is running under a container, more likely > that not, the JVM will limit itself to a single CPU. This seems really > inefficient to me. I believe the point is that popular container orchestration frameworks use the cpu requests feature to map to cpu.shares. A similar question regarding this was asked by myself a while ago. See JDK-8216366. Here is what Bob Vandette had to say at the time: http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036093.html Thanks, Severin > > What should we do? > > Thanks > - Ioi > > [1] https://bugs.openjdk.java.net/browse/JDK-8281181 > [2] https://bugs.openjdk.java.net/browse/JDK-8279484 > From duke at openjdk.java.net Thu Feb 3 11:34:11 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 3 Feb 2022 11:34:11 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. In-Reply-To: <2c0f5a4e-6c60-100b-f0d3-a0a047926aef@oracle.com> References: <2c0f5a4e-6c60-100b-f0d3-a0a047926aef@oracle.com> Message-ID: <4FQ-YEzWaDvXTIqe6IYjQUlJCSS7vaoywDQfu7DNX0Y=.3de04253-c875-4bb4-9238-07d48d713183@github.com> On Wed, 2 Feb 2022 22:16:31 GMT, David Holmes wrote: > > Other missed places to change: > > jvmci/jvmciEnv.hpp: long get_long_at(JVMCIPrimitiveArray array, int index); > > services/memReporter.hpp: inline long diff_in_current_scale(size_t s1, size_t s2) const { > > services/memReporter.hpp: long amount = (long)(s1 - s2); > > services/memReporter.hpp: long scale = (long)_scale; > > services/memReporter.cpp: long amount_diff = diff_in_current_scale(current_amount, early_amount); > > services/memReporter.cpp: long reserved_diff = diff_in_current_scale(current_reserved, early_reserved); > > services/memReporter.cpp: long committed_diff = diff_in_current_scale(current_committed, early_committed); > > services/memReporter.cpp: long overhead_diff = diff_in_current_scale(_current_baseline.malloc_tracking_overhead(), > > services/memReporter.cpp: long diff_used = diff_in_current_scale(current_stats.used(), > > services/memReporter.cpp: long diff_waste = diff_in_current_scale(current_waste, early_waste); > > runtime/vmThread.cpp: long interval_ms = SafepointTracing::time_since_last_safepoint_ms(); > > Other than jvmci these are not "compiler shared code" - other cleanups in other areas will need their own RFE. > > Cheers, David Created: https://bugs.openjdk.java.net/browse/JDK-8281213 https://bugs.openjdk.java.net/browse/JDK-8281214 ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From duke at openjdk.java.net Thu Feb 3 12:14:18 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 3 Feb 2022 12:14:18 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v17] In-Reply-To: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> References: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> Message-ID: On Wed, 2 Feb 2022 16:03:48 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Update copyrights to 2022 As mentioned on the CSR, the JEP is being dropped - unless anyone has any objections. JDK-8277204 will become a normal RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Thu Feb 3 13:05:12 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 3 Feb 2022 13:05:12 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v17] In-Reply-To: References: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> Message-ID: <_Bbro08HLFKOtrG9jBdy9s3W6FOgeqZkh0_Sttkm8EM=.a7080cfc-2c89-434b-9898-4f9e1ceb4817@github.com> On Thu, 3 Feb 2022 12:11:16 GMT, Alan Hayward wrote: > As mentioned on the CSR, the JEP is being dropped - unless anyone has any objections. JDK-8277204 will become a normal RFE. Good decision. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From mdoerr at openjdk.java.net Thu Feb 3 13:16:05 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 3 Feb 2022 13:16:05 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v3] In-Reply-To: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> References: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> Message-ID: <87IrdT-fI5RIWDXfr5YY5lZ27U1v9XT30A-moZz3Mn4=.22447a22-affa-48c8-b947-a787f6570bcd@github.com> On Thu, 3 Feb 2022 10:16:45 GMT, Martin Doerr wrote: >> PPC64 implementation of JDK-8277180. >> >> `java -Xms4g -Xmx4g -jar dacapo-9.12-bach.jar h2 -s huge -t 1 -n 1 --max-iterations=35 --variance=5 --verbose --converge` >> >> Before this patch (2 runs): >> `===== DaCapo 9.12 h2 PASSED in 309753 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 300755 msec =====` >> >> After: >> `===== DaCapo 9.12 h2 PASSED in 285144 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 288255 msec =====` > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright years. Thanks for the review! Copyright updated and benchmark results added above. ------------- PR: https://git.openjdk.java.net/jdk/pull/7305 From zgu at openjdk.java.net Thu Feb 3 13:22:07 2022 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 3 Feb 2022 13:22:07 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v3] In-Reply-To: References: Message-ID: <3-lNY1lndmGQSA0Lo4d4mWVqOXnKHw7rReENOOBHzZs=.3d95213a-4a75-4b80-adff-151de90b3339@github.com> On Thu, 3 Feb 2022 05:37:45 GMT, Thomas Stuefe wrote: >> JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. >> >> On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: >> >> >> (gdb) call pp(0x7ffff010b030) >> >> "Executing pp" >> >> Thread 2 "java" received signal SIGSEGV, Segmentation fault. >> 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 >> 180 assert(addr != NULL, "Invalid address"); >> >> >> On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: >> >> >> if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { >> >> >> This patch: >> - fixes uses of SafeFetch >> - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 >> - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) >> >> Tests: >> - I tested manually that pp works with different levels of NMT (Linux x64) >> - GHAs in process > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use CanSafeFetch32 Marked as reviewed by zgu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From rrich at openjdk.java.net Thu Feb 3 14:05:10 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 3 Feb 2022 14:05:10 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v3] In-Reply-To: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> References: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> Message-ID: On Thu, 3 Feb 2022 10:16:45 GMT, Martin Doerr wrote: >> PPC64 implementation of JDK-8277180. >> >> `java -Xms4g -Xmx4g -jar dacapo-9.12-bach.jar h2 -s huge -t 1 -n 1 --max-iterations=35 --variance=5 --verbose --converge` >> >> Before this patch (2 runs): >> `===== DaCapo 9.12 h2 PASSED in 309753 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 300755 msec =====` >> >> After: >> `===== DaCapo 9.12 h2 PASSED in 285144 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 288255 msec =====` > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright years. Looks good. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7305 From stuefe at openjdk.java.net Thu Feb 3 14:15:15 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 3 Feb 2022 14:15:15 GMT Subject: RFR: JDK-8281023: NMT integration into pp debug command does not work [v3] In-Reply-To: <3-lNY1lndmGQSA0Lo4d4mWVqOXnKHw7rReENOOBHzZs=.3d95213a-4a75-4b80-adff-151de90b3339@github.com> References: <3-lNY1lndmGQSA0Lo4d4mWVqOXnKHw7rReENOOBHzZs=.3d95213a-4a75-4b80-adff-151de90b3339@github.com> Message-ID: On Thu, 3 Feb 2022 13:18:49 GMT, Zhengyu Gu wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use CanSafeFetch32 > > Marked as reviewed by zgu (Reviewer). Thanks @zhengyu123 and @iklam. ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From stuefe at openjdk.java.net Thu Feb 3 14:15:15 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 3 Feb 2022 14:15:15 GMT Subject: Integrated: JDK-8281023: NMT integration into pp debug command does not work In-Reply-To: References: Message-ID: <0BG4ktv-tf6tZ2V4J7aM-2n405mpnTugPT3PqK3uE9o=.fe33f34f-e79d-4151-8a53-ec93e1fd99eb@github.com> On Tue, 1 Feb 2022 08:36:42 GMT, Thomas Stuefe wrote: > JDK-8280289 enhanced the debug pp() command to use NMT if enabled, and to print NMT related info. That is useful, but there are some issues. > > On debug, it just asserts, since the empty reserved region we create to hold the output of the mmap-search is created with address=NULL: > > > (gdb) call pp(0x7ffff010b030) > > "Executing pp" > > Thread 2 "java" received signal SIGSEGV, Segmentation fault. > 0x00007ffff6721a71 in VirtualMemoryRegion::VirtualMemoryRegion (this=this at entry=0x7ffff5bb2620, addr=addr at entry=0x0, size=size at entry=0) at /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/services/virtualMemoryTracker.hpp:180 > 180 assert(addr != NULL, "Invalid address"); > > > On release we don't assert and get further, but the use of SafeFetch is slightly wrong. It will deny us any NMT data about p if *p==0: > > > if (CanUseSafeFetchN() && SafeFetchN((intptr_t*)p, 0) != 0) { > > > This patch: > - fixes uses of SafeFetch > - changes the mmap-region-search-code to not require an empty ReservedMemoryRegion in order to avoid triggering the assert in virtualMemoryTracker.hpp:180 > - adds a comment about the safe use of pp() in gdb (one needs to switch off signal handling of SIGSEGV for this to work) > > Tests: > - I tested manually that pp works with different levels of NMT (Linux x64) > - GHAs in process This pull request has now been integrated. Changeset: 010965c8 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/010965c86ab39260b882df807c4f5d6420b20ca9 Stats: 97 lines in 5 files changed: 54 ins; 25 del; 18 mod 8281023: NMT integration into pp debug command does not work Reviewed-by: zgu, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/7297 From prappo at openjdk.java.net Thu Feb 3 14:58:12 2022 From: prappo at openjdk.java.net (Pavel Rappo) Date: Thu, 3 Feb 2022 14:58:12 GMT Subject: Integrated: 8281057: Fix doc references to overriding in JLS In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 16:19:01 GMT, Pavel Rappo wrote: > While looking into guts of javadoc comment inheritance, I noticed that a number of places in JDK seem to confuse JLS 8.4.6.** with JLS 8.4.8.**. > > Granted, "8.4.6 Method Throws" tangentially addresses overriding. However, I believe that the real target should be "8.4.8. Inheritance, Overriding, and Hiding" and its subsections. This pull request has now been integrated. Changeset: 1f926609 Author: Pavel Rappo URL: https://git.openjdk.java.net/jdk/commit/1f926609372c9b80dde831a014310a3729768c92 Stats: 18 lines in 5 files changed: 0 ins; 0 del; 18 mod 8281057: Fix doc references to overriding in JLS Reviewed-by: darcy, iris, dholmes, cjplummer ------------- PR: https://git.openjdk.java.net/jdk/pull/7311 From duke at openjdk.java.net Thu Feb 3 16:51:50 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 3 Feb 2022 16:51:50 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v17] In-Reply-To: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> References: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> Message-ID: On Wed, 2 Feb 2022 16:03:48 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Update copyrights to 2022 As requested in the RFE, added the new flag to the man page. Also updated the building.md instructions. However, I'm not sure how to add to the release notes - I can't find any files or a process. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Feb 3 16:51:48 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 3 Feb 2022 16:51:48 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v18] In-Reply-To: References: Message-ID: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Documentation updates ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/6255d4c8..d97883b5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=17 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=16-17 Stats: 34 lines in 2 files changed: 33 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From lucy at openjdk.java.net Thu Feb 3 17:42:09 2022 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 3 Feb 2022 17:42:09 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v3] In-Reply-To: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> References: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> Message-ID: <0UtRVkT3nknIO6XWwzOhMs1SSZPNHfyeaNGkWY8qQkE=.6addc46e-77f2-4b19-8390-bc0497092a53@github.com> On Thu, 3 Feb 2022 10:16:45 GMT, Martin Doerr wrote: >> PPC64 implementation of JDK-8277180. >> >> `java -Xms4g -Xmx4g -jar dacapo-9.12-bach.jar h2 -s huge -t 1 -n 1 --max-iterations=35 --variance=5 --verbose --converge` >> >> Before this patch (2 runs): >> `===== DaCapo 9.12 h2 PASSED in 309753 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 300755 msec =====` >> >> After: >> `===== DaCapo 9.12 h2 PASSED in 285144 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 288255 msec =====` > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright years. Changes look good to me. Nice performance gain for such a small change! ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7305 From minqi at openjdk.java.net Thu Feb 3 18:07:12 2022 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 3 Feb 2022 18:07:12 GMT Subject: Integrated: 8278753: Runtime crashes with access violation during JNI_CreateJavaVM call In-Reply-To: References: Message-ID: <3dNFLZSlmP3oTlqhbEvQrTbVPUmh43zxnRHqXoUxCR8=.da8cdf20-6b08-4c58-ba75-cb4981cd80ad@github.com> On Tue, 25 Jan 2022 00:20:19 GMT, Yumin Qi wrote: > Please review, > When jlink with --compress=2, zip is used to compress the files while doing copy. The user case failed to load zip.dll, since zip.dll is not set in PATH. This failure is after we get NULL from GetModuleHandle("zip.dll"), then do LoadLibrary("zip.dll") will have same result. > The fix is calling load_zip_library of ClassLoader first --- if zip library already loaded just return the cached handle for following usage, if not, load zip library and cached the handle. > > Tests: tier1,4,7 in test > Manually tested user case, and checked output of jimage list for jlinked files using --compress=2. > > Thanks > Yumin This pull request has now been integrated. Changeset: cda9c301 Author: Yumin Qi URL: https://git.openjdk.java.net/jdk/commit/cda9c3011beeec8df68e78e096132e712255ce1b Stats: 49 lines in 6 files changed: 18 ins; 14 del; 17 mod 8278753: Runtime crashes with access violation during JNI_CreateJavaVM call Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/7206 From duke at openjdk.java.net Thu Feb 3 19:38:18 2022 From: duke at openjdk.java.net (Yi-Fan Tsai) Date: Thu, 3 Feb 2022 19:38:18 GMT Subject: Integrated: 8251505: Use of types in compiler shared code should be consistent. In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 03:35:13 GMT, Yi-Fan Tsai wrote: > 8251505: Use of types in compiler shared code should be consistent. This pull request has now been integrated. Changeset: b6935dfb Author: Yi-Fan Tsai Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/b6935dfb86a1c011355d2dfb2140be26ec536351 Stats: 33 lines in 10 files changed: 2 ins; 0 del; 31 mod 8251505: Use of types in compiler shared code should be consistent. Reviewed-by: phh ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From lucy at openjdk.java.net Thu Feb 3 21:43:07 2022 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 3 Feb 2022 21:43:07 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames In-Reply-To: References: Message-ID: <4a2kmV7FQ6RflwAUiV4gyzghvRreLpLiF2DranH1LJI=.1032f99a-08eb-44b9-b932-34b68e745ac4@github.com> On Tue, 1 Feb 2022 17:22:57 GMT, Martin Doerr wrote: > s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. Changes look good to me. Is there a chance to "officially" run some JFR jtreg tests? ------------- Changes requested by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7312 From dlong at openjdk.java.net Thu Feb 3 22:01:10 2022 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 3 Feb 2022 22:01:10 GMT Subject: RFR: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 04:11:38 GMT, Dean Long wrote: > Reproduced the problem with a new JASM test rather than relying on idiosyncrasies of javac. > The fix is to not look at the next instruction (which might be the beginning of an unreachable block) if the current instruction doesn't fall through (like "goto"!). Thanks Tobias and Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/7331 From dlong at openjdk.java.net Thu Feb 3 22:14:18 2022 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 3 Feb 2022 22:14:18 GMT Subject: Integrated: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 04:11:38 GMT, Dean Long wrote: > Reproduced the problem with a new JASM test rather than relying on idiosyncrasies of javac. > The fix is to not look at the next instruction (which might be the beginning of an unreachable block) if the current instruction doesn't fall through (like "goto"!). This pull request has now been integrated. Changeset: e44dc638 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/e44dc638b8936b1b76ca9ddf9ece0c5c4705a19c Stats: 121 lines in 3 files changed: 120 ins; 0 del; 1 mod 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack Co-authored-by: Yi Yang Co-authored-by: Yi Yang Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/7331 From dholmes at openjdk.java.net Thu Feb 3 22:19:10 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 3 Feb 2022 22:19:10 GMT Subject: RFR: 8251505: Use of types in compiler shared code should be consistent. [v8] In-Reply-To: References: Message-ID: On Thu, 3 Feb 2022 00:03:53 GMT, Yi-Fan Tsai wrote: >> 8251505: Use of types in compiler shared code should be consistent. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Fix JVMCIEnv::get_long_at These changes also appear okay to me. Where we have changed from 32-bit to 64-bit types we will need to watch for issues with the 32-bit builds. ------------- PR: https://git.openjdk.java.net/jdk/pull/7294 From mdoerr at openjdk.java.net Thu Feb 3 23:01:45 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 3 Feb 2022 23:01:45 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v2] In-Reply-To: References: Message-ID: > s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix istate in stack range check. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7312/files - new: https://git.openjdk.java.net/jdk/pull/7312/files/f491da86..934e13c0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7312/head:pull/7312 PR: https://git.openjdk.java.net/jdk/pull/7312 From mdoerr at openjdk.java.net Thu Feb 3 23:01:46 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 3 Feb 2022 23:01:46 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 17:22:57 GMT, Martin Doerr wrote: > s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. I had ran a couple of JFR jtreg tests. But obviously not enough ones. I'll try more. "officially", we don't test on s390 any more. ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From mdoerr at openjdk.java.net Fri Feb 4 09:18:10 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 4 Feb 2022 09:18:10 GMT Subject: RFR: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 [v3] In-Reply-To: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> References: <7ErBIaYMp6HAZqIyG-r8_B9EI3sw4hu3VzZ54SPYaKk=.3db6e7d9-cb10-4532-b1c9-19a6a3239f58@github.com> Message-ID: On Thu, 3 Feb 2022 10:16:45 GMT, Martin Doerr wrote: >> PPC64 implementation of JDK-8277180. >> >> `java -Xms4g -Xmx4g -jar dacapo-9.12-bach.jar h2 -s huge -t 1 -n 1 --max-iterations=35 --variance=5 --verbose --converge` >> >> Before this patch (2 runs): >> `===== DaCapo 9.12 h2 PASSED in 309753 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 300755 msec =====` >> >> After: >> `===== DaCapo 9.12 h2 PASSED in 285144 msec =====` >> `===== DaCapo 9.12 h2 PASSED in 288255 msec =====` > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright years. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/7305 From mdoerr at openjdk.java.net Fri Feb 4 09:18:11 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 4 Feb 2022 09:18:11 GMT Subject: Integrated: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 13:23:42 GMT, Martin Doerr wrote: > PPC64 implementation of JDK-8277180. > > `java -Xms4g -Xmx4g -jar dacapo-9.12-bach.jar h2 -s huge -t 1 -n 1 --max-iterations=35 --variance=5 --verbose --converge` > > Before this patch (2 runs): > `===== DaCapo 9.12 h2 PASSED in 309753 msec =====` > `===== DaCapo 9.12 h2 PASSED in 300755 msec =====` > > After: > `===== DaCapo 9.12 h2 PASSED in 285144 msec =====` > `===== DaCapo 9.12 h2 PASSED in 288255 msec =====` This pull request has now been integrated. Changeset: 46c6c6f3 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/46c6c6f308b5ec0ec3b762df4b76de555287474c Stats: 23 lines in 1 file changed: 8 ins; 3 del; 12 mod 8281043: Intrinsify recursive ObjectMonitor locking for PPC64 Reviewed-by: rrich, lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/7305 From duke at openjdk.java.net Fri Feb 4 09:19:15 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 4 Feb 2022 09:19:15 GMT Subject: RFR: 8277204: Implementation of JEP 8264130: PAC-RET protection for Linux/AArch64 [v17] In-Reply-To: References: <52o8K8q5wBP4HgBI3AljysgeR6tbogiOtQYu0VhWOAA=.80d5b306-f67f-4a87-836f-44bdbb0713f1@github.com> Message-ID: On Thu, 3 Feb 2022 16:49:08 GMT, Alan Hayward wrote: > However, I'm not sure how to add to the release notes - I can't find any files or a process. Ok, This part I understand now :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From shade at openjdk.java.net Fri Feb 4 11:21:27 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 4 Feb 2022 11:21:27 GMT Subject: RFR: 8072070: Improve interpreter stack banging Message-ID: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. I think it is fairly complete, and so would like to solicit more feedback and testing here. Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: compiler.compiler: +77% compiler.sunflow: +69% compress: +166% crypto.rsa: +15% crypto.signverify: +70% mpegaudio: +8% serial: +50% sunflow: +57% xml.transform: +61% xml.validation: +43% My new `java.lang.invoke` benchmarks improve a lot as well: Benchmark Mode Cnt Score Error Units # Mainline MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op VHGet.plain avgt 5 231.372 ? 3.044 ns/op VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op # This WIP MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op VHGet.plain avgt 5 52.506 ? 3.768 ns/op VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op It also palpably improves startup even on small HelloWorld, _even when compilers are present_: $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) 96 context-switches # 4.353 K/sec ( +- 0.07% ) 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) 67,296,528 instructions # 0.85 insn per cycle # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) 98 context-switches # 4.519 K/sec ( +- 0.07% ) 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) 66,742,892 instructions # 0.86 insn per cycle # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) Additional testing: - [x] Linux x86_64 fastdebug, `tier1` - [ ] Linux x86_64 fastdebug, `tier2` - [ ] Linux x86_64 fastdebug, `tier3` - [x] Linux x86_32 fastdebug, `tier1` - [ ] Linux x86_32 fastdebug, `tier2` - [ ] Linux x86_32 fastdebug, `tier3` ------------- Commit messages: - Initial fairly complete fix Changes: https://git.openjdk.java.net/jdk/pull/7247/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8072070 Stats: 190 lines in 6 files changed: 169 ins; 4 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/7247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7247/head:pull/7247 PR: https://git.openjdk.java.net/jdk/pull/7247 From mdoerr at openjdk.java.net Fri Feb 4 13:57:10 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 4 Feb 2022 13:57:10 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Thu, 27 Jan 2022 18:42:15 GMT, Aleksey Shipilev wrote: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [ ] Linux x86_64 fastdebug, `tier2` > - [ ] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [ ] Linux x86_32 fastdebug, `tier2` > - [ ] Linux x86_32 fastdebug, `tier3` Hi Aleksey, thanks for working on the stack banging code. I wanted to do so for a long time, but couldn't make it, yet. Results are impressive! A quick question. Why can't we just use something like the following on linux? __ cmpptr(rsp, Address(r15_thread, JavaThread::stack_overflow_limit_offset())); __ jump_cc(Assembler::belowEqual, ExternalAddress(Interpreter::_throw_StackOverflowError_entry)); Is banging the shadow area strictly required on linux? Could be that it is needed on some OSes. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From mdoerr at openjdk.java.net Fri Feb 4 15:45:48 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 4 Feb 2022 15:45:48 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: > s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix sender_sp. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7312/files - new: https://git.openjdk.java.net/jdk/pull/7312/files/934e13c0..6d9446a8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7312/head:pull/7312 PR: https://git.openjdk.java.net/jdk/pull/7312 From lucy at openjdk.java.net Fri Feb 4 16:02:12 2022 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Fri, 4 Feb 2022 16:02:12 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. Yes, I know. I was hoping for support from IBM / Red Hat. Tyler Steele had been helpful in the past. He seems to have changed name. "backwaterred" is not known anymore. ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From shade at openjdk.java.net Fri Feb 4 16:02:13 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 4 Feb 2022 16:02:13 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Fri, 4 Feb 2022 13:54:02 GMT, Martin Doerr wrote: > A quick question. Why can't we just use something like the following on linux? > > ``` > __ cmpptr(rsp, Address(r15_thread, JavaThread::stack_overflow_limit_offset())); > __ jump_cc(Assembler::belowEqual, ExternalAddress(Interpreter::_throw_StackOverflowError_entry)); > ``` > > Is banging the shadow area strictly required on linux? Could be that it is needed on some OSes. (There is a large comment in `stackOverflow.hpp` -- do you see blind spots there?) My early patches were something like that. But the deeper I got into this, the more I realized it is safer to keep banging in order to cooperate with the rest of stack overflow machinery. For example, I am not at all sure that throwing the SOE when below `stack_overflow_limit` works well with reserved zone handling. It was probably okay when we only had the yellow+red zones. AFAIU, the only OS that needs to bang page by page to commit stacks is Windows; got some funky GHA failures without it. But, given how the watermark code effectively bangs each part of the stack once, I don't see a reason to bother with OS-specific code here. We can keep "overbanging" on Linux, and pay little cost for it. Same with `native_call`-s. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From mdoerr at openjdk.java.net Fri Feb 4 16:27:09 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 4 Feb 2022 16:27:09 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. @backwaterred is still there. But now with picture and real name. I've got a lot of the JFR jtreg tests passing, but I'll try to run more over the weekend. Additional testing would still be appreciated, though. ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From mdoerr at openjdk.java.net Fri Feb 4 17:28:09 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 4 Feb 2022 17:28:09 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Thu, 27 Jan 2022 18:42:15 GMT, Aleksey Shipilev wrote: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [ ] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` I think it would be interesting to figure out if we can let the linux kernel do all the stack management work for us and avoid stack banging, protected pages etc. inside of hotspot completely. But that may be beyond the scope of your PR. (Windows is a different story.) I hope that I can find time to figure it out at some point of time. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From duke at openjdk.java.net Fri Feb 4 18:25:13 2022 From: duke at openjdk.java.net (Tyler Steele) Date: Fri, 4 Feb 2022 18:25:13 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. Hello ??. I am happy to do some official testing. I have set up a run for tier1 and jfr tests on our s390 machines. I will let you know what I find. ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From harold.seigel at oracle.com Fri Feb 4 20:24:42 2022 From: harold.seigel at oracle.com (Harold Seigel) Date: Fri, 4 Feb 2022 15:24:42 -0500 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> Message-ID: Information on how cpu's are calculated can be found here: https://bugs.openjdk.java.net/browse/JDK-8197867 Harold On 2/3/2022 2:30 AM, Ioi Lam wrote: > Please see the bug report [1] for detailed description and test cases. > > I'd like to have some discussion before we can decide what to do. > > I discovered this issue when analyzing JDK-8279484 [2]. Under > Kubernetes (minikube), Runtime.availableProcessors() returns 1, > despite that the fact the machine has 32 CPUs, the Kubernetes node has > a single deployment, and no CPU limits were set. > > Specifically, I want to understand why the JDK is using > CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the > Java process. > > In cgroup, there are other ways that are designed specifically for > limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why > is using cpu_quota() alone not enough? Why did we choose the current > approach of considering both cpu_quota() and cpu_shares()? > > My guess is that sometimes people don't limit the actual number of > CPUs per container, but instead use CPU Shares to set the relative > scheduling priority between containers. > > I.e., they run "docker run --cpu-shares=1234" without using the > "--cpus" flag. > > If this is indeed the reason, I can understand the (good) intention, > but the solution seems awfully insufficient. > > CPU Shares is a *relative* number. How much CPU is allocated to you > depends on > > - how many other processes are actively running > - what their CPU Shares are > > The above information can change dynamically, as other processes may > be added or removed, and they can change between active and idle states. > > However, the JVM treats CPU Shares as an *absolute/static* number, and > sets the CPU quota of the current process using this very simplistic > formula. > > Value of /sys/fs/cgroup/cpu.shares -> cpu quota: > > ??? 1023 -> 1 CPU > ??? 1024 -> no limit (huh??) > ??? 2048 -> 2 CPUs > ??? 4096 -> 4 CPUs > > This seems just wrong to me. There's no way you can get a "correct" > result without knowing anything about other processes that are running > at the same time. > > The net effect is when Java is running under a container, more likely > that not, the JVM will limit itself to a single CPU. This seems really > inefficient to me. > > What should we do? > > Thanks > - Ioi > > [1] https://bugs.openjdk.java.net/browse/JDK-8281181 > [2] https://bugs.openjdk.java.net/browse/JDK-8279484 From xliu at openjdk.java.net Sat Feb 5 01:26:07 2022 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 5 Feb 2022 01:26:07 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: <2ycwqbNL3ATeqhegUjGcUmLVYi4-2IOhATj8QK5qQNw=.86c91bd8-9d24-42ff-aa1e-8391742850c5@github.com> On Thu, 27 Jan 2022 18:42:15 GMT, Aleksey Shipilev wrote: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [ ] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 715: > 713: } > 714: > 715: void TemplateInterpreterGenerator::bang_stack_shadow_pages(bool native_call) { The watermark algorithm should also work on other architectures such as aarch64, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From xliu at openjdk.java.net Sat Feb 5 01:32:09 2022 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 5 Feb 2022 01:32:09 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Thu, 27 Jan 2022 18:42:15 GMT, Aleksey Shipilev wrote: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [ ] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` since you this PR touches stackoverflow.hpp, Could you also take a look at this? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackOverflow.cpp#L66 we actually get the page size from os. why do we need alignment = 4k? ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Sat Feb 5 07:31:40 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sat, 5 Feb 2022 07:31:40 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Sat, 5 Feb 2022 01:28:47 GMT, Xin Liu wrote: > since you this PR touches stackoverflow.hpp, Could you also take a look at this? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackOverflow.cpp#L66 > > we actually get the page size from os. why do we need alignment = 4k? Look here: https://github.com/openjdk/jdk/blob/48523b090886f7b24ed4009f0c150efaa6f7b056/src/hotspot/share/runtime/stackOverflow.cpp#L42-L45 -- the `StackYellowPages`, `StackRedPages`, `StackShadowPages` are defined in as 4K pages. It should probably be called `unit`, not `alignment`. I'd like to avoid scope creep for this PR, so that's for another day. > src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 715: > >> 713: } >> 714: >> 715: void TemplateInterpreterGenerator::bang_stack_shadow_pages(bool native_call) { > > The watermark algorithm should also work on other architectures such as aarch64, right? Yes, as I stated in PR text: "This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. " ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From xliu at openjdk.java.net Sat Feb 5 08:17:21 2022 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 5 Feb 2022 08:17:21 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Thu, 27 Jan 2022 18:42:15 GMT, Aleksey Shipilev wrote: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [x] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` src/hotspot/share/runtime/stackOverflow.hpp line 166: > 164: // into adjacent thread stack, or even into other readable memory. This would potentially > 165: // pass the check by accident. > 166: // c) Allow for incremental stack growth by handling traps from not yet committed thread I failed to understand why we have to do "incremental stack growth" here. Why can't use touch the last page? __ bang_stack_with_offset(n_shadow_pages*page_size); The entire shadow zone is mapped. Touching it causes commit, page faults or SEGV. First 2 events are transparent for the userspace process. Hotspot will trap into the signal handler if `bang_stack_shadow_pages` does cross shadow_zone_safe_limit(). `rsp + n_shadow_pages * page_size` falls into 2 possibilities: 1. red zone: the program is about to die anyway. 2. yellow reserved zones, both are recoverable. I feel it's not necessary to touch pages from 1 to n_shadow_pages-1. The side effect is same as touching the last page directly. ps: I tried this [idea](https://github.com/navyxliu/jdk/runs/5075962312?check_suite_focus=true). 2 failures are found on Windows. I guess the premise that the shadow zone is mapped is false on Windows. compiler/interpreter/cr7116216/StackOverflow.java compiler/uncommontrap/UncommonTrapStackBang.java ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From xliu at openjdk.java.net Sat Feb 5 09:21:09 2022 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 5 Feb 2022 09:21:09 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Sat, 5 Feb 2022 08:13:34 GMT, Xin Liu wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > src/hotspot/share/runtime/stackOverflow.hpp line 166: > >> 164: // into adjacent thread stack, or even into other readable memory. This would potentially >> 165: // pass the check by accident. >> 166: // c) Allow for incremental stack growth by handling traps from not yet committed thread > > I failed to understand why we have to do "incremental stack growth" here. Why can't use touch the last page? > > __ bang_stack_with_offset(n_shadow_pages*page_size); > > > The entire shadow zone is mapped. Touching it causes commit, page faults or SEGV. First 2 events are transparent for the userspace process. > > Hotspot will trap into the signal handler if `bang_stack_shadow_pages` does cross shadow_zone_safe_limit(). `rsp + n_shadow_pages * page_size` falls into 2 possibilities: > 1. red zone: the program is about to die anyway. > 2. yellow reserved zones, both are recoverable. > > I feel it's not necessary to touch pages from 1 to n_shadow_pages-1. The side effect is same as touching the last page directly. > > ps: I tried this [idea](https://github.com/navyxliu/jdk/runs/5075962312?check_suite_focus=true). 2 failures are found on Windows. I guess the premise that the shadow zone is mapped is false on Windows. > > compiler/interpreter/cr7116216/StackOverflow.java > compiler/uncommontrap/UncommonTrapStackBang.java I read this blogpost and I need to take back my comment. https://pangin.pro/posts/stack-overflow-handling now I think interpreter has to do linear probing to make sure HotSpot executes Java programs correctly. reserve_zone has special meaning. Further, if rsp is very close to the shadow_zone_safe_limit(), so-called last page may surpass red zone. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From duke at openjdk.java.net Sat Feb 5 15:40:33 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sat, 5 Feb 2022 15:40:33 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts Message-ID: Hi, This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. Thank you very much. ------------- Commit messages: - unsigned cast intrinsics Changes: https://git.openjdk.java.net/jdk/pull/7358/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7358&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8278173 Stats: 494 lines in 16 files changed: 435 ins; 24 del; 35 mod Patch: https://git.openjdk.java.net/jdk/pull/7358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7358/head:pull/7358 PR: https://git.openjdk.java.net/jdk/pull/7358 From shade at openjdk.java.net Sun Feb 6 07:28:08 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 6 Feb 2022 07:28:08 GMT Subject: RFR: 8072070: Improve interpreter stack banging In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Sat, 5 Feb 2022 09:18:17 GMT, Xin Liu wrote: >> src/hotspot/share/runtime/stackOverflow.hpp line 166: >> >>> 164: // into adjacent thread stack, or even into other readable memory. This would potentially >>> 165: // pass the check by accident. >>> 166: // c) Allow for incremental stack growth by handling traps from not yet committed thread >> >> I failed to understand why we have to do "incremental stack growth" here. Why can't use touch the last page? >> >> __ bang_stack_with_offset(n_shadow_pages*page_size); >> >> >> The entire shadow zone is mapped. Touching it causes commit, page faults or SEGV. First 2 events are transparent for the userspace process. >> >> Hotspot will trap into the signal handler if `bang_stack_shadow_pages` does cross shadow_zone_safe_limit(). `rsp + n_shadow_pages * page_size` falls into 2 possibilities: >> 1. red zone: the program is about to die anyway. >> 2. yellow reserved zones, both are recoverable. >> >> I feel it's not necessary to touch pages from 1 to n_shadow_pages-1. The side effect is same as touching the last page directly. >> >> ps: I tried this [idea](https://github.com/navyxliu/jdk/runs/5075962312?check_suite_focus=true). 2 failures are found on Windows. I guess the premise that the shadow zone is mapped is false on Windows. >> >> compiler/interpreter/cr7116216/StackOverflow.java >> compiler/uncommontrap/UncommonTrapStackBang.java > > I read this blogpost and I need to take back my comment. > https://pangin.pro/posts/stack-overflow-handling > > now I think interpreter has to do linear probing to make sure HotSpot executes Java programs correctly. reserve_zone has special meaning. Further, if rsp is very close to the shadow_zone_safe_limit(), so-called last page may surpass red zone. Yes, that's exactly what point "(c)" is about. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Sun Feb 6 08:03:39 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Sun, 6 Feb 2022 08:03:39 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [x] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Rectify comment "(c)" ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7247/files - new: https://git.openjdk.java.net/jdk/pull/7247/files/b1ed28f8..c3983819 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7247/head:pull/7247 PR: https://git.openjdk.java.net/jdk/pull/7247 From david.holmes at oracle.com Mon Feb 7 01:12:21 2022 From: david.holmes at oracle.com (David Holmes) Date: Mon, 7 Feb 2022 11:12:21 +1000 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <44ce9669-71cc-0c20-ecbf-265845626820@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <44ce9669-71cc-0c20-ecbf-265845626820@oracle.com> Message-ID: Just for the record ... On 3/02/2022 7:19 pm, David Holmes wrote: > Hi Ioi, > > For the benefit of the mailing list discussion ... > > On 3/02/2022 5:30 pm, Ioi Lam wrote: >> Please see the bug report [1] for detailed description and test cases. >> >> I'd like to have some discussion before we can decide what to do. >> >> I discovered this issue when analyzing JDK-8279484 [2]. Under >> Kubernetes (minikube), Runtime.availableProcessors() returns 1, >> despite that the fact the machine has 32 CPUs, the Kubernetes node has >> a single deployment, and no CPU limits were set. >> >> Specifically, I want to understand why the JDK is using >> CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the >> Java process. > > Because we were asked to by customers deploying in containers. > >> In cgroup, there are other ways that are designed specifically for >> limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why >> is using cpu_quota() alone not enough? Why did we choose the current >> approach of considering both cpu_quota() and cpu_shares()? > > Because people were using both (whether that made sense or not) and so > we needed a policy on what to do if both were set. > >> My guess is that sometimes people don't limit the actual number of >> CPUs per container, but instead use CPU Shares to set the relative >> scheduling priority between containers. >> >> I.e., they run "docker run --cpu-shares=1234" without using the >> "--cpus" flag. >> >> If this is indeed the reason, I can understand the (good) intention, >> but the solution seems awfully insufficient. >> >> CPU Shares is a *relative* number. How much CPU is allocated to you >> depends on >> >> - how many other processes are actively running >> - what their CPU Shares are >> >> The above information can change dynamically, as other processes may >> be added or removed, and they can change between active and idle states. >> >> However, the JVM treats CPU Shares as an *absolute/static* number, and >> sets the CPU quota of the current process using this very simplistic >> formula. > > From old discussion and the code I believe the thought was that share > was relative to the the per-cpu default shares of 1024. So we use that > to determine the fraction of each CPU that should be assigned, and we > should then use that to determine the available number of CPUs. But that > isn't what we actually do - we only calculate the fraction and round it > up to get the number of CPUs and that is wrong (and typically only gives > 1 cpu because shares < 1024). I speculate that what was intended was to > map from having an X% share of each CPU, to instead having access to X% > of the total CPUs (at 100% of each). Mathematically this has some basis > but it actually makes no practical sense from a throughput or response > time perspective. If I'm allowed 50% of the CPU per time period to do my > calculations, I want 100% of each CPU for half of the period as that > potentially minimises the elapsed time till I have a result. > >> Value of /sys/fs/cgroup/cpu.shares -> cpu quota: >> >> ???? 1023 -> 1 CPU >> ???? 1024 -> no limit (huh??) >> ???? 2048 -> 2 CPUs >> ???? 4096 -> 4 CPUs >> >> This seems just wrong to me. There's no way you can get a "correct" >> result without knowing anything about other processes that are running >> at the same time. > > As I said above and in the bug report I think this was an error and the > intent was to then multiply by the number of actual processors. Not it was not an error. See the discussion Severin referenced: http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036093.html David ----- >> The net effect is when Java is running under a container, more likely >> that not, the JVM will limit itself to a single CPU. This seems really >> inefficient to me. > > Yes. > >> What should we do? > > We could just adjust the calculation as I suggested. > > Or, given that share aka weight is meaningless without knowing the total > weight in the system we could just ignore it. The app then gets access > to all cpu's and it is up to the container to track actual usage and > impose any limits configured. > > I've always thought that these cgroups mechanisms were fundamentally > flawed and that if the intent was to define a resource limited > environment, then the environment should report what resources were > available by the normal APIs. They got this right with cpu-sets by > integrating with sched_getaffinity; but for shares and quotas it has > been left to the applications to try and figure out what that should > mean - and that makes no sense to me. > > Cheers, > David > >> Thanks >> - Ioi >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8281181 >> [2] https://bugs.openjdk.java.net/browse/JDK-8279484 From ioi.lam at oracle.com Mon Feb 7 04:16:30 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Sun, 6 Feb 2022 20:16:30 -0800 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> Message-ID: <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> On 2/3/2022 3:29 AM, Severin Gehwolf wrote: > Hi Ioi, > > On Wed, 2022-02-02 at 23:30 -0800, Ioi Lam wrote: >> Please see the bug report [1] for detailed description and test cases. >> >> I'd like to have some discussion before we can decide what to do. >> >> I discovered this issue when analyzing JDK-8279484 [2]. Under Kubernetes >> (minikube), Runtime.availableProcessors() returns 1, despite that the >> fact the machine has 32 CPUs, the Kubernetes node has a single >> deployment, and no CPU limits were set. > From looking at the bug it would be good to know why a cpu.weight value > of 1 is being obverved. The default is 100. I.e. if it is really unset: > > $ sudo docker run --rm -v $(pwd)/jdk17:/opt/jdk:z fedora:35 /opt/jdk/bin/java -Xlog:os+container=trace --version > [0.000s][trace][os,container] OSContainer::init: Initializing Container Support > [0.001s][debug][os,container] Detected cgroups v2 unified hierarchy > [0.001s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max > [0.001s][trace][os,container] Raw value for memory limit is: max > [0.001s][trace][os,container] Memory Limit is: Unlimited > [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max > [0.001s][trace][os,container] Raw value for CPU quota is: max > [0.001s][trace][os,container] CPU Quota is: -1 > [0.001s][trace][os,container] Path to /cpu.max is /sys/fs/cgroup//cpu.max > [0.001s][trace][os,container] CPU Period is: 100000 > [0.001s][trace][os,container] Path to /cpu.weight is /sys/fs/cgroup//cpu.weight > [0.001s][trace][os,container] Raw value for CPU shares is: 100 > [0.001s][debug][os,container] CPU Shares is: -1 > [0.001s][trace][os,container] OSContainer::active_processor_count: 4 > [0.001s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 > [0.001s][debug][os,container] container memory limit unlimited: -1, using host value > [0.001s][debug][os,container] container memory limit unlimited: -1, using host value > [0.002s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 > [0.007s][debug][os,container] container memory limit unlimited: -1, using host value > [0.014s][trace][os,container] CgroupSubsystem::active_processor_count (cached): 4 > [0.022s][trace][os,container] Path to /memory.max is /sys/fs/cgroup//memory.max > [0.022s][trace][os,container] Raw value for memory limit is: max > [0.022s][trace][os,container] Memory Limit is: Unlimited > [0.022s][debug][os,container] container memory limit unlimited: -1, using host value > openjdk 17.0.2-internal 2022-01-18 > OpenJDK Runtime Environment (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u) > OpenJDK 64-Bit Server VM (build 17.0.2-internal+0-adhoc.sgehwolf.jdk17u, mixed mode, sharing) In JDK-8279484, the JVM is launched by Kubernetes, which manages CPU resources with the concept of "request" and "limit". https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/ Quote: "Containers cannot use more CPU than the configured limit. ? ? ? ? Provided the system has CPU time free, a container ??????? is guaranteed to be allocated as much CPU as it requests." So "CPU request" is a guaranteed minimum. For example, if you have a container that requests 6 CPUs, but all the hosts in your cluster have no more than 4 CPUs each, then this container will never be deployed by Kubernetes, because the minimum of 6 CPUs cannot be guaranteed. Consider the following 4 cases: (1) You specify both "cpu request" and "cpu limit" (2) You specify only "cpu limit" -> Kubernetes will set the ? ? "cpu request" to be the same as the limit. (3) If you specify only "cpu request", Kubernetes will set ? ? the "cpu limit" to a default value that's not smaller ? ? than the request. (4) Neither "cpu request" nor "cpu limit" is set (For details about the defaults, see https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/ ) In the first 3 cases, the JVM (in cgroupv1)? will see that both cpu.cfs_quota_us and cpu.shares are set. The cpu.shares will be ignored (due to the PreferContainerQuotaForCPUCount flag. See JDK-8197867). Case (4) is the cause for the bug in JDK-8279484 Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2. This means: - This container is guaranteed a minimum amount of CPU resources - If no other containers are executing, this container can use as ? much CPU as available on the host - If other containers are executing, the amount of CPU available ? to this container is (2 / (sum of cpu.shares of all active ? containers)) The fundamental problem with the current JVM implementation is that it treats "CPU request" as a maximum value, the opposite of what Kubernetes does. Because of this, in case (4), the JVM artificially limits itself to a single CPU. This leads to CPU underutilization. >> Specifically, I want to understand why the JDK is using >> CgroupSubsystem::cpu_shares() to limit the number of CPUs used by the >> Java process. > TLDR: Kubernetes and/or other container orchestration frameworks? That > was back in the day of cgroups v1, though. > >> In cgroup, there are other ways that are designed specifically for >> limiting the number of CPUs, i.e., CgroupSubsystem::cpu_quota(). Why is >> using cpu_quota() alone not enough? Why did we choose the current >> approach of considering both cpu_quota() and cpu_shares()? > Kubernetes has a concept of "cpu requests" and "cpu limit". It maps (or > mapped?) those values to cpu shares and cpu quota in cgroups. > >> My guess is that sometimes people don't limit the actual number of CPUs >> per container, but instead use CPU Shares to set the relative scheduling >> priority between containers. >> >> I.e., they run "docker run --cpu-shares=1234" without using the "--cpus" >> flag. >> >> If this is indeed the reason, I can understand the (good) intention, but >> the solution seems awfully insufficient. >> >> CPU Shares is a *relative* number. How much CPU is allocated to you >> depends on >> >> - how many other processes are actively running >> - what their CPU Shares are >> >> The above information can change dynamically, as other processes may be >> added or removed, and they can change between active and idle states. >> >> However, the JVM treats CPU Shares as an *absolute/static* number, and >> sets the CPU quota of the current process using this very simplistic >> formula. >> >> Value of /sys/fs/cgroup/cpu.shares -> cpu quota: >> >> ???? 1023 -> 1 CPU >> ???? 1024 -> no limit (huh??) >> ???? 2048 -> 2 CPUs >> ???? 4096 -> 4 CPUs >> >> This seems just wrong to me. There's no way you can get a "correct" >> result without knowing anything about other processes that are running >> at the same time. >> >> The net effect is when Java is running under a container, more likely >> that not, the JVM will limit itself to a single CPU. This seems really >> inefficient to me. > I believe the point is that popular container orchestration frameworks > use the cpu requests feature to map to cpu.shares. A similar question > regarding this was asked by myself a while ago. See JDK-8216366. > > Here is what Bob Vandette had to say at the time: > http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-January/036093.html To quote Bob's reply from the above e-mail: Although the value for cpu-shares can be set to any of the values that you mention, we decided to follow the convention set by Kubernetes and other container orchestration products that use 1024 as the unit for cpu shares. Ignoring the cpu shares in this case is not what users of this popular technology want. https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu ? The spec.containers[].resources.requests.cpu is converted to its core value, which is potentially fractional, and multiplied by 1024. The greater of this number or 2 is used as the value of the --cpu-shares flag in the docker run command. ? The spec.containers[].resources.limits.cpu is converted to its millicore value and multiplied by 100. The resulting value is the total amount of CPU time that a container can use every 100ms. A container cannot use more than its share of CPU time during this interval. As I mentioned above, Bob's conclusion that cpu.shares should be used as an upper limit value was probably based on the misunderstanding of what resources.requests.cpu means in Kubernetes. With resources.requests.cpu = 1.0, docker runs with --cpu-shares=1024 This means "I need at least 1 CPU to execute". However, JVM incorrectly treats this as "I promise I will not used more than 1 CPU'. Thanks - Ioi > Thanks, > Severin > >> What should we do? >> >> Thanks >> - Ioi >> >> [1]https://bugs.openjdk.java.net/browse/JDK-8281181 >> [2]https://bugs.openjdk.java.net/browse/JDK-8279484 >> From shade at openjdk.java.net Mon Feb 7 07:31:10 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 7 Feb 2022 07:31:10 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Sat, 5 Feb 2022 05:50:34 GMT, Aleksey Shipilev wrote: > > since you this PR touches stackoverflow.hpp, Could you also take a look at this? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackOverflow.cpp#L66 > > we actually get the page size from os. why do we need alignment = 4k? > > Look here: > > https://github.com/openjdk/jdk/blob/48523b090886f7b24ed4009f0c150efaa6f7b056/src/hotspot/share/runtime/stackOverflow.cpp#L42-L45 > -- the `StackYellowPages`, `StackRedPages`, `StackShadowPages` are defined in as 4K pages. It should probably be called `unit`, not `alignment`. I'd like to avoid scope creep for this PR, so that's for another day. Done in #7362. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From mdoerr at openjdk.java.net Mon Feb 7 10:18:10 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 7 Feb 2022 10:18:10 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. Awesome! Thanks a lot for testing! I have also run all JFR jtreg tests I found and the have passed. So, I will need reviews to proceed. I'll backport to 17u and 11u (together with the other JFR related fixes). ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From aph at openjdk.java.net Mon Feb 7 10:53:14 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 10:53:14 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: <-FQjRIxxiyiMqo8kEwUVP6XzCEfHIiRdbmlmGaPfXmA=.c56eee79-194a-425b-a645-775deb963d7b@github.com> On Thu, 3 Feb 2022 16:51:48 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Documentation updates doc/building.md line 141: > 139: > 140: In order to use Branch Protection features in the VM, `--enable-branch-protection` > 141: must be provided. This requires compiler support (GCC 9.1.0+ or Clang 10+). The Suggestion: must be used. This option requires C++ compiler support (GCC 9.1.0+ or Clang 10+). The doc/building.md line 143: > 141: must be provided. This requires compiler support (GCC 9.1.0+ or Clang 10+). The > 142: resulting build can be run on both machines with and without support for branch > 143: protection in hardware. This is only supported for Linux targets. Suggestion: protection in hardware. Branch Protection is only supported for Linux targets. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:02:11 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:02:11 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 10:55:52 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Documentation updates > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5293: > >> 5291: // Create an additional frame for a function. >> 5292: void MacroAssembler::enter_subframe() { >> 5293: // Addresses can only be signed once, so strip it first. PAC safe because the value is not > > This needs a more descriptive name. `enter_and_sign()` ? No, that's not right either. How do we come up with a name that's more descriptive? Because enter always enters a subframe. That's what it's for. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:02:11 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:02:11 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> On Thu, 3 Feb 2022 16:51:48 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Documentation updates src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 1163: > 1161: #undef INSN > 1162: > 1163: // PAC branch instructions (with register modifier) This section title is wrong. According to DDI0487G, the correct section title is "Unconditional branch (register)". All of the instructions in each section of this file should be grouped in the same way that they are in the Arm ARM. src/hotspot/cpu/aarch64/frame_aarch64.cpp line 275: > 273: if (TracePcPatching) { > 274: tty->print_cr("patch_pc at address " INTPTR_FORMAT " [" INTPTR_FORMAT " -> " INTPTR_FORMAT "]", > 275: p2i(pc_addr), p2i(*pc_addr), p2i(signed_pc)); Let's see both pc and signed pc here. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5293: > 5291: // Create an additional frame for a function. > 5292: void MacroAssembler::enter_subframe() { > 5293: // Addresses can only be signed once, so strip it first. PAC safe because the value is not This needs a more descriptive name. `enter_and_sign()` ? No, that's not right either. How do we come up with a name that's more descriptive? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:09:14 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:09:14 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 10:58:59 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Documentation updates > > src/hotspot/cpu/aarch64/frame_aarch64.cpp line 275: > >> 273: if (TracePcPatching) { >> 274: tty->print_cr("patch_pc at address " INTPTR_FORMAT " [" INTPTR_FORMAT " -> " INTPTR_FORMAT "]", >> 275: p2i(pc_addr), p2i(*pc_addr), p2i(signed_pc)); > > Let's see both pc and signed pc here. > Let's see both pc and signed pc here, if they are different. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:09:14 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:09:14 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: On Thu, 3 Feb 2022 16:51:48 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Documentation updates src/hotspot/os_cpu/linux_aarch64/pauth_linux_aarch64.inline.hpp line 57: > 55: register address r17 __asm("r17") = ret_addr; > 56: register address r16 __asm("r16") = sp; > 57: asm volatile (PACIA1716 : "+r"(r17) : "r"(r16)); I don't see the point of `volatile` here, any more than you'd use volatile on an addition. `volatile` is when you have a side effect you care about. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:17:18 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:17:18 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: On Thu, 3 Feb 2022 16:51:48 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Documentation updates src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5328: > 5326: // Uses the FP from the start of the function as the modifier - which is stored at the address of > 5327: // the current FP. > 5328: // Is it? C2 uses FP as a scratch register. I guess we know that this is never used in C2-generated code? I'm tempted to put an assertion here, just in case. Or does it not matter? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 7 11:32:15 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 11:32:15 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 11:06:20 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/frame_aarch64.cpp line 275: >> >>> 273: if (TracePcPatching) { >>> 274: tty->print_cr("patch_pc at address " INTPTR_FORMAT " [" INTPTR_FORMAT " -> " INTPTR_FORMAT "]", >>> 275: p2i(pc_addr), p2i(*pc_addr), p2i(signed_pc)); >> >> Let's see both pc and signed pc here. > >> Let's see both pc and signed pc here, if they are different. Are you sure? At the moment with PAC we get: patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc -> 0x00abffffed17b7f8] With both signed and unsigned you'd have: patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc (0x0000ffffed17b5fc) -> 0x00abffffed17b7f8 (0x0000ffffed17b7f8)] I prefer the first - it's shorter and you can infer the address from the signed version. Happy to go with the longer version if you think the shorter version is confusing. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:46:20 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:46:20 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 11:28:30 GMT, Alan Hayward wrote: >>> Let's see both pc and signed pc here, if they are different. > > Are you sure? At the moment with PAC we get: > > patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc -> 0x00abffffed17b7f8] > > With both signed and unsigned you'd have: > > patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc (0x0000ffffed17b5fc) -> 0x00abffffed17b7f8 (0x0000ffffed17b7f8)] > > I prefer the first - it's shorter and you can infer the address from the signed version. Happy to go with the longer version if you think the shorter version is confusing. You've been looking at PAC-signed addresses for a long time. Let's see "at address [prev true dest -> new true dest] [signed prev signed dest -> new signed dest]", but only show the signed dests if they're different. So it appears as `patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc -> 0x00abffffed17b7f8] [signed 0x0000ffffed17b5fc -> 0x0000ffffed17b7f8]` . ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 7 11:46:21 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 11:46:21 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 10:57:15 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5293: >> >>> 5291: // Create an additional frame for a function. >>> 5292: void MacroAssembler::enter_subframe() { >>> 5293: // Addresses can only be signed once, so strip it first. PAC safe because the value is not >> >> This needs a more descriptive name. `enter_and_sign()` ? No, that's not right either. How do we come up with a name that's more descriptive? > > Because enter always enters a subframe. That's what it's for. enter_nested() ? enter_inner() ? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 7 11:46:21 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 11:46:21 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: On Mon, 7 Feb 2022 11:11:02 GMT, Andrew Haley wrote: >> Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: >> >> Documentation updates > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5328: > >> 5326: // Uses the FP from the start of the function as the modifier - which is stored at the address of >> 5327: // the current FP. >> 5328: // > > Is it? C2 uses FP as a scratch register. I guess we know that this is never used in C2-generated code? I'm tempted to put an assertion here, just in case. Or does it not matter? Allocating FP is disabled for rop protection: aarch64.md has: // r29 is not allocatable when PreserveFramePointer or ROP protection is on if (PreserveFramePointer || VM_Version::use_rop_protection()) { I think that covers it. What assertion would you want to check? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 7 11:57:12 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 11:57:12 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 11:43:13 GMT, Andrew Haley wrote: >> Are you sure? At the moment with PAC we get: >> >> patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc -> 0x00abffffed17b7f8] >> >> With both signed and unsigned you'd have: >> >> patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc (0x0000ffffed17b5fc) -> 0x00abffffed17b7f8 (0x0000ffffed17b7f8)] >> >> I prefer the first - it's shorter and you can infer the address from the signed version. Happy to go with the longer version if you think the shorter version is confusing. > > You've been looking at PAC-signed addresses for a long time. > Let's see "at address [prev true dest -> new true dest] [signed prev signed dest -> new signed dest]", but only show the signed dests if they're different. So it appears as > `patch_pc at address 0x0000fffff58edf98 [0x0068ffffed17b5fc -> 0x00abffffed17b7f8] [signed 0x0000ffffed17b5fc -> 0x0000ffffed17b7f8]` . ok, that looks better than my longer version, I'll go with that ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 11:57:12 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 11:57:12 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: On Mon, 7 Feb 2022 11:41:57 GMT, Alan Hayward wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5328: >> >>> 5326: // Uses the FP from the start of the function as the modifier - which is stored at the address of >>> 5327: // the current FP. >>> 5328: // >> >> Is it? C2 uses FP as a scratch register. I guess we know that this is never used in C2-generated code? I'm tempted to put an assertion here, just in case. Or does it not matter? > > Allocating FP is disabled for rop protection: > > aarch64.md has: > // r29 is not allocatable when PreserveFramePointer or ROP protection is on > if (PreserveFramePointer || VM_Version::use_rop_protection()) { > > I think that covers it. > What assertion would you want to check? If `UseROPProtection` is on, is there any reason not to set `PreserveFramePointer`, and assert here that it is set? It is a crucial assumption, so let's assert it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 12:01:13 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 12:01:13 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: On Mon, 7 Feb 2022 11:42:43 GMT, Alan Hayward wrote: >> Because enter always enters a subframe. That's what it's for. > > enter_nested() ? > enter_inner() ? Tell you what, first put a comment here that says when it should (and therefore, should not) be used. Once it's clear exactly what this is for, thinking of a name maight be easier. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 7 12:04:16 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 12:04:16 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> Message-ID: <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> On Mon, 7 Feb 2022 11:54:09 GMT, Andrew Haley wrote: >> Allocating FP is disabled for rop protection: >> >> aarch64.md has: >> // r29 is not allocatable when PreserveFramePointer or ROP protection is on >> if (PreserveFramePointer || VM_Version::use_rop_protection()) { >> >> I think that covers it. >> What assertion would you want to check? > > If `UseROPProtection` is on, is there any reason not to set `PreserveFramePointer`, and assert here that it is set? It is a crucial assumption, so let's assert it. PreserveFramePointer is doing some additional stuff. I'll give it a test to make sure everything still works with PreserveFramePointer fully set. It would make things easier just to force it set with rop protection on. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Mon Feb 7 12:29:21 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 12:29:21 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> Message-ID: On Mon, 7 Feb 2022 12:01:18 GMT, Alan Hayward wrote: > PreserveFramePointer is doing some additional stuff. I'll give it a test to make sure everything still works with PreserveFramePointer fully set. It would make things easier just to force it set with rop protection on. Using PreserveFramePointer greatly simplifies the testing matrix, and has little adverse performance impact beyond disallowing C2 from allocating FP as a scratch register. It also simplifies this patch, which would be a very Good Thing. Let's do it. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From jiefu at openjdk.java.net Mon Feb 7 12:43:02 2022 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 7 Feb 2022 12:43:02 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Mon, 7 Feb 2022 07:27:57 GMT, Aleksey Shipilev wrote: >>> since you this PR touches stackoverflow.hpp, Could you also take a look at this? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackOverflow.cpp#L66 >>> >>> we actually get the page size from os. why do we need alignment = 4k? >> >> Look here: https://github.com/openjdk/jdk/blob/48523b090886f7b24ed4009f0c150efaa6f7b056/src/hotspot/share/runtime/stackOverflow.cpp#L42-L45 -- the `StackYellowPages`, `StackRedPages`, `StackShadowPages` are defined in as 4K pages. It should probably be called `unit`, not `alignment`. I'd like to avoid scope creep for this PR, so that's for another day. > >> > since you this PR touches stackoverflow.hpp, Could you also take a look at this? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackOverflow.cpp#L66 >> > we actually get the page size from os. why do we need alignment = 4k? >> >> Look here: >> >> https://github.com/openjdk/jdk/blob/48523b090886f7b24ed4009f0c150efaa6f7b056/src/hotspot/share/runtime/stackOverflow.cpp#L42-L45 >> -- the `StackYellowPages`, `StackRedPages`, `StackShadowPages` are defined in as 4K pages. It should probably be called `unit`, not `alignment`. I'd like to avoid scope creep for this PR, so that's for another day. > > Done in #7362. Hi @shipilev , Did you test the perf improvement base on the latest jdk? I tried to test SPECjvm2008's `compiler.compiler` with jdk19, but failed with Benchmark: compiler.compiler Run mode: timed run Test type: multi Threads: 8 Warmup: 120s Iterations: 1 Run length: 240s Error in setup of Benchmark. spec.harness.StopBenchmarkException: Error invoking bmSetupBenchmarkMethod at spec.harness.ProgramRunner.invokeBmSetupBenchmark(ProgramRunner.java:185) at spec.harness.ProgramRunner.runBenchmark(ProgramRunner.java:301) at spec.harness.ProgramRunner.run(ProgramRunner.java:98) Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:119) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at spec.harness.ProgramRunner.invokeBmSetupBenchmark(ProgramRunner.java:183) ... 2 more Caused by: java.lang.NoClassDefFoundError: com/sun/tools/javac/util/JavacFileManager at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1013) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) at spec.benchmarks.compiler.MainBase.preSetupBenchmark(MainBase.java:38) at spec.benchmarks.compiler.compiler.Main.setupBenchmark(Main.java:38) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ... 4 more Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.util.JavacFileManager at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) ... 16 more Warmup (120s) begins: Mon Feb 07 20:35:40 CST 2022 Warmup (120s) ends: Mon Feb 07 20:35:40 CST 2022 Warmup (120s) result: **NOT VALID** Am I missed something? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From duke at openjdk.java.net Mon Feb 7 13:47:14 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 13:47:14 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> Message-ID: <-V7ptCS4QdcpFHOomMnTPPYvFtKSQ0nswzFNXQDoWLg=.2d72897f-ef45-4867-892f-64df085eca85@github.com> On Mon, 7 Feb 2022 11:58:22 GMT, Andrew Haley wrote: >> enter_nested() ? >> enter_inner() ? > > Tell you what, first put a comment here that says when it should (and therefore, should not) be used. Once it's clear exactly what this is for, thinking of a name maight be easier. How about extending the existing enter() function: // Enter a new stack frame for the current method. // nested: Indicates a frame has already been entered (and not left) for the current method. void MacroAssembler::enter(bool nested=false) { if (nested) strip() protect() stp() mov() } This would add an additional bool check for every call of enter() - that's at code generation time, so probably not an issue. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 7 13:58:51 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 7 Feb 2022 13:58:51 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v19] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Review fixups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/d97883b5..614a3262 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=18 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=17-18 Stats: 20 lines in 4 files changed: 7 ins; 6 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From coleenp at openjdk.java.net Mon Feb 7 14:41:06 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 7 Feb 2022 14:41:06 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Mon, 7 Feb 2022 14:20:54 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Rectify comment "(c)" > > src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 758: > >> 756: } >> 757: >> 758: __ cmpptr(rsp, Address(thread, JavaThread::shadow_zone_safe_limit())); > > stack watermark starts at stack_base, increase to current rsp to optimize away stack banging for rsp greater than this. Cannot increase watermark if esp < shadow_zone_safe_limit because of ... looking for a brief comment why here. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From coleenp at openjdk.java.net Mon Feb 7 14:41:06 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 7 Feb 2022 14:41:06 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Sun, 6 Feb 2022 08:03:39 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Rectify comment "(c)" This looks like a nice optimization to me. I initially thought that calling the new limit of where we can elide stack banging "watermark" had something to do with the GC stack watermark code but "watermark" is sort of the best word for this. If we had another descriptive word that might be better, but I can't think of anything. Thank you for fixing this! We didn't have tests showing the motivation ourselves so sorry that we ignored it for so long. src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 758: > 756: } > 757: > 758: __ cmpptr(rsp, Address(thread, JavaThread::shadow_zone_safe_limit())); stack watermark starts at stack_base, increase to current rsp to optimize away stack banging for rsp greater than this. Cannot increase watermark if esp < shadow_zone_safe_limit because of ... ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7247 From chagedorn at openjdk.java.net Mon Feb 7 15:04:10 2022 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 7 Feb 2022 15:04:10 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v3] In-Reply-To: References: Message-ID: > When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method: > > Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f > > This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method. > > This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): > > Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 (c1_Compilation.cpp:607) > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 (compileBroker.cpp:2291) > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df (compileBroker.cpp:1966) > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 (compilerThread.cpp:59) > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d (thread.cpp:1297) > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f (os_linux.cpp:705) > > For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. > > The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf > I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. > > The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability. > > Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. > > **Testing:** > Apart from manual testing, I've added two kinds of tests: > - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers. > - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename. > > On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation. > > To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number. > > I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user). > > Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`! > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Change log_* to log_develop_* and log_warning to log_develop_info ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7126/files - new: https://git.openjdk.java.net/jdk/pull/7126/files/7ddb7737..698663b9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7126&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7126&range=01-02 Stats: 74 lines in 2 files changed: 0 ins; 0 del; 74 mod Patch: https://git.openjdk.java.net/jdk/pull/7126.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7126/head:pull/7126 PR: https://git.openjdk.java.net/jdk/pull/7126 From aph at openjdk.java.net Mon Feb 7 15:15:17 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 7 Feb 2022 15:15:17 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <-V7ptCS4QdcpFHOomMnTPPYvFtKSQ0nswzFNXQDoWLg=.2d72897f-ef45-4867-892f-64df085eca85@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> <-V7ptCS4QdcpFHOomMnTPPYvFtKSQ0nswzFNXQDoWLg=.2d72897f-ef45-4867-892f-64df085eca85@github.com> Message-ID: <-nQf8_Gh666U_KH2wCMBEApxI3GFXre1cghHN41KoVg=.c0bc85fd-16ed-49f5-a595-73893facf6df@github.com> On Mon, 7 Feb 2022 13:43:55 GMT, Alan Hayward wrote: >> Tell you what, first put a comment here that says when it should (and therefore, should not) be used. Once it's clear exactly what this is for, thinking of a name maight be easier. > > How about extending the existing enter() function: > > // Enter a new stack frame for the current method. > // nested: Indicates a frame has already been entered (and not left) for the current method. > void MacroAssembler::enter(bool nested=false) { > if (nested) strip() > protect() > stp() > mov() > } > > This would add an additional bool check for every call of enter() - that's at code generation time, so probably not an issue. So, `nested` is true iff we are, say, pushing an extra frame for a runtime call in the middle of generated code, but for some mysterious reason the logic is inline instead of being implemented in the obvious way as a stub. Please do this as: ` MacroAssembler::enter(bool strip_return_address=false)` and I'll be happy. Please make sure that all calls are commented, as in `__ enter(/*strip_return_address*/true);` and I'll be happy. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From shade at openjdk.java.net Mon Feb 7 15:28:08 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 7 Feb 2022 15:28:08 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Mon, 7 Feb 2022 12:40:24 GMT, Jie Fu wrote: > I tried to test SPECjvm2008's `compiler.compiler` with jdk19. I don't think currently public SPECjvm2008 works with JDK 19 due to missing dependencies. I have a hacky version that is able to work with modern JDK. I used that to estimate performance on JDK mainline. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Mon Feb 7 15:42:46 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 7 Feb 2022 15:42:46 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v3] In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [x] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7247/files - new: https://git.openjdk.java.net/jdk/pull/7247/files/c3983819..2c710882 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=01-02 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7247/head:pull/7247 PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Mon Feb 7 15:42:48 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 7 Feb 2022 15:42:48 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v2] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Mon, 7 Feb 2022 14:23:38 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 758: >> >>> 756: } >>> 757: >>> 758: __ cmpptr(rsp, Address(thread, JavaThread::shadow_zone_safe_limit())); >> >> stack watermark starts at stack_base, increase to current rsp to optimize away stack banging for rsp greater than this. Cannot increase watermark if esp < shadow_zone_safe_limit because of ... > > looking for a brief comment why here. See new commit. I realized I need to write down explicitly that growth watermark is always above the safe limit, for the check to be safe. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Mon Feb 7 15:50:11 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 7 Feb 2022 15:50:11 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v3] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: <8OYBV_pnf17dKEOSLQPB7HucivwweoUv9apRxVd_Oik=.2495bcae-b6f9-46fe-a3bc-00d32deba2ae@github.com> On Mon, 7 Feb 2022 15:42:46 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comments Fiddly code, documentation update for the core subsystem, so: ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From sgehwolf at redhat.com Mon Feb 7 18:36:25 2022 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Mon, 07 Feb 2022 19:36:25 +0100 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> Message-ID: On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote: > Case (4) is the cause for the bug in JDK-8279484 > > Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2. > This means: > > - This container is guaranteed a minimum amount of CPU resources > - If no other containers are executing, this container can use as > ?? much CPU as available on the host > - If other containers are executing, the amount of CPU available > ?? to this container is (2 / (sum of cpu.shares of all active > ?? containers)) > > > The fundamental problem with the current JVM implementation is that it > treats "CPU request" as a maximum value, the opposite of what Kubernetes > does. Because of this, in case (4), the JVM artificially limits itself > to a single CPU. This leads to CPU underutilization. I agree with your analysis. Key point is that in such a setup Kubernetes sets CPU shares value to 2. Though, it's a very specific case. In contrast to Kubernetes the JVM doesn't have insight into what other containers are doing (or how they are configured). It would, perhaps, be good to know what Kubernetes does for containers when the environment (i.e. other containers) changes. Do they get restarted? Restarted with different values for cpu shares? Either way, what are our options to fix this? Does it need fixing? * Should we no longer take cpu shares as a means to limit CPU into account? It would be a significant change to how previous JDKs worked. Maybe that wouldn't be such a bad idea :) * How likely is CPU underutilization to happen in practise? Considering the container is not the only container on the node, then according to your formula, it'll get one CPU or less anyway. Underutilization would, thus, only happen when it's an idle node with no other containers running. That would suggest to do nothing and let the user override it as they see fit. * Something else I'm missing? Thanks, Severin From hseigel at openjdk.java.net Mon Feb 7 21:36:30 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 7 Feb 2022 21:36:30 GMT Subject: RFR: 8281400: Remove unused wcslen() function from globalDefinitions_gcc.hpp Message-ID: Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. Thanks, Harold ------------- Commit messages: - 8281400: Remove unused wcslen() function from globalDefinitions_gcc.hpp Changes: https://git.openjdk.java.net/jdk/pull/7374/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7374&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281400 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7374.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7374/head:pull/7374 PR: https://git.openjdk.java.net/jdk/pull/7374 From dcubed at openjdk.java.net Mon Feb 7 21:54:08 2022 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 7 Feb 2022 21:54:08 GMT Subject: RFR: 8281400: Remove unused wcslen() function from globalDefinitions_gcc.hpp In-Reply-To: References: Message-ID: On Mon, 7 Feb 2022 21:30:32 GMT, Harold Seigel wrote: > Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. > > Thanks, Harold "grep" concurs that this function is only used in os_windows.cpp. Thumbs up. This is a trivial fix. Not your problem, but the XLC header has the same issue: src/hotspot/share/utilities/globalDefinitions_xlc.hpp:inline int wcslen(const jchar* x) { return wcslen((const wchar_t*)x); } ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7374 From duke at openjdk.java.net Tue Feb 8 01:29:10 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Tue, 8 Feb 2022 01:29:10 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 06:41:41 GMT, KIRIYAMA Takuya wrote: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? Hi, JFR team Could somebody please review this fix for 8280684? ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From dholmes at openjdk.java.net Tue Feb 8 02:38:10 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 8 Feb 2022 02:38:10 GMT Subject: RFR: 8281400: Remove unused wcslen() function from globalDefinitions_gcc.hpp In-Reply-To: References: Message-ID: On Mon, 7 Feb 2022 21:30:32 GMT, Harold Seigel wrote: > Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. > > Thanks, Harold Please fix both files and update the JBS issue title etc. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7374 From jiefu at openjdk.java.net Tue Feb 8 04:00:04 2022 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 8 Feb 2022 04:00:04 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v3] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Mon, 7 Feb 2022 15:42:46 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More comments src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 739: > 737: __ cmpptr(Address(thread, JavaThread::shadow_zone_safe_limit()), (int32_t)NULL_WORD); > 738: __ jcc(Assembler::notEqual, L_good_limit); > 739: __ stop("shadow zone safe limit is not initialized"); This indentation seems strange to me. Also see lines {745, 762}. And we'd better update the copyright year for all touched files. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From ioi.lam at oracle.com Tue Feb 8 06:29:52 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 7 Feb 2022 22:29:52 -0800 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> Message-ID: On 2022/02/07 10:36, Severin Gehwolf wrote: > On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote: >> Case (4) is the cause for the bug in JDK-8279484 >> >> Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2. >> This means: >> >> - This container is guaranteed a minimum amount of CPU resources >> - If no other containers are executing, this container can use as >> ?? much CPU as available on the host >> - If other containers are executing, the amount of CPU available >> ?? to this container is (2 / (sum of cpu.shares of all active >> ?? containers)) >> >> >> The fundamental problem with the current JVM implementation is that it >> treats "CPU request" as a maximum value, the opposite of what Kubernetes >> does. Because of this, in case (4), the JVM artificially limits itself >> to a single CPU. This leads to CPU underutilization. > I agree with your analysis. Key point is that in such a setup > Kubernetes sets CPU shares value to 2. Though, it's a very specific > case. > > In contrast to Kubernetes the JVM doesn't have insight into what other > containers are doing (or how they are configured). It would, perhaps, > be good to know what Kubernetes does for containers when the > environment (i.e. other containers) changes. Do they get restarted? > Restarted with different values for cpu shares? My understanding is that Kubernetes will try to do load balancing and may migrate the containers. According to this: https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit If you change the CPU limits, a currently running container will be shut down and restarted (using the new limit), and may be relocated to a different host if necessary. I think this means that a JVM process doesn't need to worry about the CPU limit changing during its lifetime :-) > Either way, what are our options to fix this? Does it need fixing? > > * Should we no longer take cpu shares as a means to limit CPU into > account? It would be a significant change to how previous JDKs > worked. Maybe that wouldn't be such a bad idea :) I think we should get rid of it. This feature was designed to work with Kubernetes, but has no effect in most cases. The only time it takes effect (when no resource limits are set) it does the opposite of what the user expects. Also, the current implementation is really tied to specific behaviors of Kubernetes + docker (the 1024 and 100 constants). This will cause problems with other container/orchestration software that use different algorithms and constants. > * How likely is CPU underutilization to happen in practise? > Considering the container is not the only container on the node, > then according to your formula, it'll get one CPU or less anyway. > Underutilization would, thus, only happen when it's an idle node > with no other containers running. That would suggest to do nothing > and let the user override it as they see fit. I think under utilization happens when the containers have a bursty usage pattern. If other containers do not fully utilize their CPU quotas, we should distribute the unused CPUs to the busy containers. Thanks - Ioi > * Something else I'm missing? > > Thanks, > Severin > From shade at openjdk.java.net Tue Feb 8 07:18:52 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 8 Feb 2022 07:18:52 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v4] In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [x] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - Indents - Drop the test group definition - Update copyrights ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7247/files - new: https://git.openjdk.java.net/jdk/pull/7247/files/2c710882..ffd560ab Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=02-03 Stats: 41 lines in 4 files changed: 0 ins; 28 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/7247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7247/head:pull/7247 PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Tue Feb 8 07:18:54 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 8 Feb 2022 07:18:54 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v3] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 03:57:07 GMT, Jie Fu wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> More comments > > src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp line 739: > >> 737: __ cmpptr(Address(thread, JavaThread::shadow_zone_safe_limit()), (int32_t)NULL_WORD); >> 738: __ jcc(Assembler::notEqual, L_good_limit); >> 739: __ stop("shadow zone safe limit is not initialized"); > > This indentation seems strange to me. > Also see lines {745, 762}. > > And we'd better update the copyright year for all touched files. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From chagedorn at openjdk.java.net Tue Feb 8 08:17:17 2022 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 8 Feb 2022 08:17:17 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v4] In-Reply-To: References: Message-ID: > When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method: > > Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f > > This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method. > > This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): > > Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 (c1_Compilation.cpp:607) > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 (compileBroker.cpp:2291) > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df (compileBroker.cpp:1966) > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 (compilerThread.cpp:59) > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d (thread.cpp:1297) > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f (os_linux.cpp:705) > > For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. > > The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf > I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. > > The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability. > > Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. > > **Testing:** > Apart from manual testing, I've added two kinds of tests: > - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers. > - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename. > > On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation. > > To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number. > > I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user). > > Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`! > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Make dwarf tag NOT_PRODUCT ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7126/files - new: https://git.openjdk.java.net/jdk/pull/7126/files/698663b9..820f0da6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7126&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7126&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7126.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7126/head:pull/7126 PR: https://git.openjdk.java.net/jdk/pull/7126 From duke at openjdk.java.net Tue Feb 8 09:26:14 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 8 Feb 2022 09:26:14 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> Message-ID: On Mon, 7 Feb 2022 12:25:44 GMT, Andrew Haley wrote: >> PreserveFramePointer is doing some additional stuff. I'll give it a test to make sure everything still works with PreserveFramePointer fully set. It would make things easier just to force it set with rop protection on. > >> PreserveFramePointer is doing some additional stuff. I'll give it a test to make sure everything still works with PreserveFramePointer fully set. It would make things easier just to force it set with rop protection on. > > Using PreserveFramePointer greatly simplifies the testing matrix, and has little adverse performance impact beyond disallowing C2 from allocating FP as a scratch register. It also simplifies this patch, which would be a very Good Thing. Let's do it. Doing this caused 7 failures across a full jtreg run, namely: serviceability/sa/ClhsdbFindPC.java#xcomp-core vmTestbase/jit/misctests/fpustack/GraphApplet.java vmTestbase/nsk/jdi/MonitorWaitRequest/MonitorWaitRequest001/TestDescription.java vmTestbase/nsk/jdi/MonitorWaitedRequest/MonitorWaitedRequest001/TestDescription.java vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn002/forceEarlyReturn002.java vmTestbase/nsk/jdwp/ThreadReference/OwnedMonitorsStackDepthInfo/ownedMonitorsStackDepthInfo002/ownedMonitorsStackDepthInfo002.java vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine/TestDescription.java ....I'll investigate. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Tue Feb 8 09:44:16 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 8 Feb 2022 09:44:16 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> Message-ID: On Tue, 8 Feb 2022 09:22:39 GMT, Alan Hayward wrote: > Doing this caused 7 failures across a full jtreg run, namely: I'm glad we caught that. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From kbarrett at openjdk.java.net Tue Feb 8 10:09:48 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 10:09:48 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: > Please review this change to NonblockingQueue to improve invariants in the > append operation by making a change in try_pop. > > When taking the last entry in the queue, try_pop needs to do some cleanup of > the queue fields, setting them to NULL. The order of those cleanups doesn't > matter for correctness. However, setting first _head then _tail permits > append to assert that _head is NULL when it finds _tail was NULL. The current > order (set _tail first, then _head) doesn't permit such an assertion. > > Testing: > mach5 tier1-3 > > I also did lots of testing with this change included while investigating > JDK-8273383. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into append-invariant - minor comment fixes - append invariant ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7250/files - new: https://git.openjdk.java.net/jdk/pull/7250/files/4559ec8d..9648d183 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7250&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7250&range=00-01 Stats: 9830 lines in 385 files changed: 6406 ins; 1963 del; 1461 mod Patch: https://git.openjdk.java.net/jdk/pull/7250.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7250/head:pull/7250 PR: https://git.openjdk.java.net/jdk/pull/7250 From iwalulya at openjdk.java.net Tue Feb 8 11:14:12 2022 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 8 Feb 2022 11:14:12 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: On Tue, 8 Feb 2022 10:09:48 GMT, Kim Barrett wrote: >> Please review this change to NonblockingQueue to improve invariants in the >> append operation by making a change in try_pop. >> >> When taking the last entry in the queue, try_pop needs to do some cleanup of >> the queue fields, setting them to NULL. The order of those cleanups doesn't >> matter for correctness. However, setting first _head then _tail permits >> append to assert that _head is NULL when it finds _tail was NULL. The current >> order (set _tail first, then _head) doesn't permit such an assertion. >> >> Testing: >> mach5 tier1-3 >> >> I also did lots of testing with this change included while investigating >> JDK-8273383. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into append-invariant > - minor comment fixes > - append invariant Lgtm! ------------- Marked as reviewed by iwalulya (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7250 From sgehwolf at redhat.com Tue Feb 8 11:32:07 2022 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Tue, 08 Feb 2022 12:32:07 +0100 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> Message-ID: <5d25e7ceeabd9186dd6fe5e9e6e04d0d11ef26c0.camel@redhat.com> On Mon, 2022-02-07 at 22:29 -0800, Ioi Lam wrote: > On 2022/02/07 10:36, Severin Gehwolf wrote: > > On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote: > > > Case (4) is the cause for the bug in JDK-8279484 > > > > > > Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2. > > > This means: > > > > > > - This container is guaranteed a minimum amount of CPU resources > > > - If no other containers are executing, this container can use as > > > ??? much CPU as available on the host > > > - If other containers are executing, the amount of CPU available > > > ??? to this container is (2 / (sum of cpu.shares of all active > > > ??? containers)) > > > > > > > > > The fundamental problem with the current JVM implementation is that it > > > treats "CPU request" as a maximum value, the opposite of what Kubernetes > > > does. Because of this, in case (4), the JVM artificially limits itself > > > to a single CPU. This leads to CPU underutilization. > > I agree with your analysis. Key point is that in such a setup > > Kubernetes sets CPU shares value to 2. Though, it's a very specific > > case. > > > > In contrast to Kubernetes the JVM doesn't have insight into what other > > containers are doing (or how they are configured). It would, perhaps, > > be good to know what Kubernetes does for containers when the > > environment (i.e. other containers) changes. Do they get restarted? > > Restarted with different values for cpu shares? > > My understanding is that Kubernetes will try to do load balancing and > may migrate the containers. According to this: > > https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit > > If you change the CPU limits, a currently running container will be shut > down and restarted (using the new limit), and may be relocated to a > different host if necessary. > > I think this means that a JVM process doesn't need to worry about the > CPU limit changing during its lifetime :-) > > Either way, what are our options to fix this? Does it need fixing? > > > > ? * Should we no longer take cpu shares as a means to limit CPU into > > ??? account? It would be a significant change to how previous JDKs > > ??? worked. Maybe that wouldn't be such a bad idea :) > > I think we should get rid of it. This feature was designed to work with > Kubernetes, but has no effect in most cases. The only time it takes > effect (when no resource limits are set) it does the opposite of what > the user expects. I tend to agree. We should start with a CSR review of this, though, as it would be a behavioural change as compared to previous versions of the JDK. > Also, the current implementation is really tied to specific behaviors of > Kubernetes + docker (the 1024 and 100 constants). This will cause > problems with other container/orchestration software that use different > algorithms and constants. There are other container orchestration frameworks, like Mesos, which behave in a similar way (1024 constant is being used). The good news is that mesos seems to have moved to a hard-limit default. See: https://mesosphere.github.io/field-notes/faqs/utilization.html https://mesos.apache.org/documentation/latest/quota/#deprecated-quota-guarantees > > > ? * How likely is CPU underutilization to happen in practise? > > ??? Considering the container is not the only container on the node, > > ??? then according to your formula, it'll get one CPU or less anyway. > > ??? Underutilization would, thus, only happen when it's an idle node > > ??? with no other containers running. That would suggest to do nothing > > ??? and let the user override it as they see fit. > > I think under utilization happens when the containers have a bursty > usage pattern. If other containers do not fully utilize their CPU > quotas, we should distribute the unused CPUs to the busy containers. Right, but this isn't really something the JVM process should care about. It's really a core feature of the orchestration framework to do that. All we could do is to not limit CPU for those cases. On the other hand there is the risk of resource starvation too. Consider a node with many cores, 50 say, and a very small cpu share setting via container limits. The experience running a JVM application in such a set up would be very mediocre as the JVM thinks it can use 50 cores (100% of the time), yet it would only get this when the rest of the containers/universe is idle. Thanks, Severin From iwalulya at openjdk.java.net Tue Feb 8 12:12:06 2022 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 8 Feb 2022 12:12:06 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: On Tue, 8 Feb 2022 10:09:48 GMT, Kim Barrett wrote: >> Please review this change to NonblockingQueue to improve invariants in the >> append operation by making a change in try_pop. >> >> When taking the last entry in the queue, try_pop needs to do some cleanup of >> the queue fields, setting them to NULL. The order of those cleanups doesn't >> matter for correctness. However, setting first _head then _tail permits >> append to assert that _head is NULL when it finds _tail was NULL. The current >> order (set _tail first, then _head) doesn't permit such an assertion. >> >> Testing: >> mach5 tier1-3 >> >> I also did lots of testing with this change included while investigating >> JDK-8273383. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into append-invariant > - minor comment fixes > - append invariant Not part of this PR, but we need to add a comment about `push/append` being susceptible to ABA behavior as discovered in JDK-8273383. ------------- PR: https://git.openjdk.java.net/jdk/pull/7250 From duke at openjdk.java.net Tue Feb 8 12:32:08 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Tue, 8 Feb 2022 12:32:08 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v4] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: <-sdR46GHtobc9KyphiiBn6TLfrmR4dDDpXwpaUsmuJg=.c6baf0ee-a06d-4fcb-a75a-349dc5646e18@github.com> On Tue, 8 Feb 2022 07:18:52 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Indents > - Drop the test group definition > - Update copyrights src/hotspot/share/runtime/stackOverflow.hpp line 121: > 119: // | shadow zone > 120: // | > 121: // -- --- <-- shadow_zone_growth_watermark() Hi, should the `watermark` be somewhere below (regarding address) the last frame instead? ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From kbarrett at openjdk.java.net Tue Feb 8 12:58:06 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 12:58:06 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: <0TXYYreM8RybLJ_4l__SPdOJLEeNAb9EJ_VSlDzTXRo=.5381dd81-a019-4806-8693-0572a9ee8b98@github.com> On Tue, 8 Feb 2022 12:08:51 GMT, Ivan Walulya wrote: > Not part of this PR, but we need to add a comment about `push/append` being susceptible to ABA behavior as discovered in JDK-8273383. See https://bugs.openjdk.java.net/browse/JDK-8280832 ------------- PR: https://git.openjdk.java.net/jdk/pull/7250 From tschatzl at openjdk.java.net Tue Feb 8 13:16:04 2022 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 8 Feb 2022 13:16:04 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: On Tue, 8 Feb 2022 10:09:48 GMT, Kim Barrett wrote: >> Please review this change to NonblockingQueue to improve invariants in the >> append operation by making a change in try_pop. >> >> When taking the last entry in the queue, try_pop needs to do some cleanup of >> the queue fields, setting them to NULL. The order of those cleanups doesn't >> matter for correctness. However, setting first _head then _tail permits >> append to assert that _head is NULL when it finds _tail was NULL. The current >> order (set _tail first, then _head) doesn't permit such an assertion. >> >> Testing: >> mach5 tier1-3 >> >> I also did lots of testing with this change included while investigating >> JDK-8273383. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into append-invariant > - minor comment fixes > - append invariant Lgtm. src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 200: > 198: // cmpxchg indicates a concurrent operation updated _head first. That > 199: // could be either a push/append or a try_pop in [Clause 1b]. > 200: Atomic::cmpxchg(&_head, result, (T*)NULL); These `NULL`s could be replaced by `nullptr`. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7250 From hseigel at openjdk.java.net Tue Feb 8 13:40:29 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 8 Feb 2022 13:40:29 GMT Subject: RFR: 8281400: Remove unused wcslen() function [v2] In-Reply-To: References: Message-ID: > Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: globalDefinitions_xlc.hpp change ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7374/files - new: https://git.openjdk.java.net/jdk/pull/7374/files/82015268..4c3375ce Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7374&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7374&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7374.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7374/head:pull/7374 PR: https://git.openjdk.java.net/jdk/pull/7374 From duke at openjdk.java.net Tue Feb 8 15:40:44 2022 From: duke at openjdk.java.net (Bhavana-Kilambi) Date: Tue, 8 Feb 2022 15:40:44 GMT Subject: RFR: 8280007: Enable Neoverse N1 optimizations for Arm Neoverse V1 & N2 Message-ID: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> As Arm Neoverse V1 and N2s will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. This patch sets these flags accordingly for both V1 and N2 architectures. ------------- Commit messages: - 8280007: Enable Neoverse N1 optimizations for Arm Neoverse V1 & N2 Changes: https://git.openjdk.java.net/jdk/pull/7383/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7383&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8280007 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7383/head:pull/7383 PR: https://git.openjdk.java.net/jdk/pull/7383 From lucy at openjdk.java.net Tue Feb 8 15:55:09 2022 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 8 Feb 2022 15:55:09 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: <5Sl9zEnZBmbo31zfsGuZxLhF_fZtEmWMxxCdJVK3vo8=.ab417d6c-66a3-4712-b807-4992ed17805b@github.com> On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. Changes look good. As you mentioned, jtreg tests are green. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7312 From hseigel at openjdk.java.net Tue Feb 8 16:04:13 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 8 Feb 2022 16:04:13 GMT Subject: Integrated: 8281400: Remove unused wcslen() function In-Reply-To: References: Message-ID: On Mon, 7 Feb 2022 21:30:32 GMT, Harold Seigel wrote: > Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. > > Thanks, Harold This pull request has now been integrated. Changeset: 380378c5 Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/380378c551b4243ef72d868571f725b390e12124 Stats: 11 lines in 2 files changed: 0 ins; 9 del; 2 mod 8281400: Remove unused wcslen() function Reviewed-by: dcubed, coleenp, lfoltan ------------- PR: https://git.openjdk.java.net/jdk/pull/7374 From coleenp at openjdk.java.net Tue Feb 8 16:04:13 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 8 Feb 2022 16:04:13 GMT Subject: RFR: 8281400: Remove unused wcslen() function [v2] In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 13:40:29 GMT, Harold Seigel wrote: >> Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > globalDefinitions_xlc.hpp change Looks good + trivial. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7374 From lfoltan at openjdk.java.net Tue Feb 8 16:04:13 2022 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Tue, 8 Feb 2022 16:04:13 GMT Subject: RFR: 8281400: Remove unused wcslen() function [v2] In-Reply-To: References: Message-ID: <80UI0jRH6YMI0QXv-1HR2H6FUk7WPbk1VephvEGmXa4=.0da23c18-9600-4214-a6a9-7c1f35ff1bfa@github.com> On Tue, 8 Feb 2022 13:40:29 GMT, Harold Seigel wrote: >> Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > globalDefinitions_xlc.hpp change Looks good. Lois ------------- Marked as reviewed by lfoltan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7374 From hseigel at openjdk.java.net Tue Feb 8 16:04:13 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 8 Feb 2022 16:04:13 GMT Subject: RFR: 8281400: Remove unused wcslen() function [v2] In-Reply-To: References: Message-ID: <9IjrQDJk7oz5LhM9uSTRo3TLRJ9Bs5s_gnW2zG5UXb4=.378d4b4f-2669-4ecd-a3d1-38879769be2d@github.com> On Tue, 8 Feb 2022 13:40:29 GMT, Harold Seigel wrote: >> Please review this small change to remove the unused wcslen() function. This change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > globalDefinitions_xlc.hpp change Thanks Dan, Coleen, and Lois for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/7374 From duke at openjdk.java.net Tue Feb 8 16:28:02 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 8 Feb 2022 16:28:02 GMT Subject: RFR: 8280007: Enable Neoverse N1 optimizations for Arm Neoverse V1 & N2 In-Reply-To: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> References: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> Message-ID: On Tue, 8 Feb 2022 15:33:20 GMT, Bhavana-Kilambi wrote: > As Arm Neoverse V1 and N2s will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for both V1 and N2 architectures. Lgtm ------------- Marked as reviewed by eastig at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/7383 From phh at openjdk.java.net Tue Feb 8 16:53:09 2022 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 8 Feb 2022 16:53:09 GMT Subject: RFR: 8280007: Enable Neoverse N1 optimizations for Arm Neoverse V1 & N2 In-Reply-To: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> References: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> Message-ID: On Tue, 8 Feb 2022 15:33:20 GMT, Bhavana-Kilambi wrote: > As Arm Neoverse V1 and N2s will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for both V1 and N2 architectures. Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7383 From rrich at openjdk.java.net Tue Feb 8 16:57:06 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 8 Feb 2022 16:57:06 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. Hi Martin, changes look reasonable. `thread_linux_s390.cpp` needs copyright header update. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7312 From mdoerr at openjdk.java.net Tue Feb 8 17:09:51 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 8 Feb 2022 17:09:51 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v4] In-Reply-To: References: Message-ID: > s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyright years. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7312/files - new: https://git.openjdk.java.net/jdk/pull/7312/files/6d9446a8..31f5aa6a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7312&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7312.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7312/head:pull/7312 PR: https://git.openjdk.java.net/jdk/pull/7312 From mdoerr at openjdk.java.net Tue Feb 8 17:09:53 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 8 Feb 2022 17:09:53 GMT Subject: RFR: 8281061: [s390] JFR runs into assertions while validating interpreter frames [v3] In-Reply-To: References: Message-ID: On Fri, 4 Feb 2022 15:45:48 GMT, Martin Doerr wrote: >> s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix sender_sp. Copyright updated. Thank you for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From shade at openjdk.java.net Tue Feb 8 17:24:41 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 8 Feb 2022 17:24:41 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [x] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Show watermark in better place on the chart ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7247/files - new: https://git.openjdk.java.net/jdk/pull/7247/files/ffd560ab..13073992 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7247&range=03-04 Stats: 12 lines in 1 file changed: 10 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7247/head:pull/7247 PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Tue Feb 8 17:24:44 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 8 Feb 2022 17:24:44 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v4] In-Reply-To: <-sdR46GHtobc9KyphiiBn6TLfrmR4dDDpXwpaUsmuJg=.c6baf0ee-a06d-4fcb-a75a-349dc5646e18@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> <-sdR46GHtobc9KyphiiBn6TLfrmR4dDDpXwpaUsmuJg=.c6baf0ee-a06d-4fcb-a75a-349dc5646e18@github.com> Message-ID: <0hYkIKMjil2vQcKwZqXCMmkYTcCrElpFswabP3fuDzI=.bc5bf151-1b7f-4943-82fa-8a48dd390c5e@github.com> On Tue, 8 Feb 2022 12:28:34 GMT, Quan Anh Mai wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Indents >> - Drop the test group definition >> - Update copyrights > > src/hotspot/share/runtime/stackOverflow.hpp line 121: > >> 119: // | shadow zone >> 120: // | >> 121: // -- --- <-- shadow_zone_growth_watermark() > > Hi, should the `watermark` be somewhere below (regarding address) the last frame instead? Good point, fixed in new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From mdoerr at openjdk.java.net Tue Feb 8 17:52:09 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 8 Feb 2022 17:52:09 GMT Subject: Integrated: 8281061: [s390] JFR runs into assertions while validating interpreter frames In-Reply-To: References: Message-ID: On Tue, 1 Feb 2022 17:22:57 GMT, Martin Doerr wrote: > s390 implementation requires small changes to avoid running into assertions in debug builds. See JBS for details. This pull request has now been integrated. Changeset: 7f19c700 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/7f19c700707573000a37910dd6d2f2bb6e8439ad Stats: 18 lines in 2 files changed: 2 ins; 3 del; 13 mod 8281061: [s390] JFR runs into assertions while validating interpreter frames Reviewed-by: lucy, rrich ------------- PR: https://git.openjdk.java.net/jdk/pull/7312 From xliu at openjdk.java.net Tue Feb 8 17:56:10 2022 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 8 Feb 2022 17:56:10 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 17:24:41 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Show watermark in better place on the chart LGTM! I am not a reviewer. we still need reviewers to approve this. ------------- Marked as reviewed by xliu (Committer). PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Tue Feb 8 18:25:44 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 8 Feb 2022 18:25:44 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment Message-ID: I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. The default values for options are different per platform, so tests are x86_64 specific. No default value is changed, this only unblocks experiments. Additional testing: - [x] New tests on Linux x86_64 fastdebug - [x] New tests on Linux x86_64 release ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/7388/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7388&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281467 Stats: 178 lines in 4 files changed: 176 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7388.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7388/head:pull/7388 PR: https://git.openjdk.java.net/jdk/pull/7388 From hseigel at openjdk.java.net Tue Feb 8 18:42:08 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 8 Feb 2022 18:42:08 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 18:19:00 GMT, Aleksey Shipilev wrote: > I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. > > Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. > > The default values for options are different per platform, so tests are x86_64 specific. > > No default value is changed, this only unblocks experiments. > > Additional testing: > - [x] New tests on Linux x86_64 fastdebug > - [x] New tests on Linux x86_64 release src/hotspot/share/runtime/globals.hpp line 1539: > 1537: range(1, 128) \ > 1538: constraint(OptoLoopAlignmentConstraintFunc, AfterErgo) \ > 1539: \ Should OptoLoopAlignment be an int, instead of an intx, since its range is small? ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From kbarrett at openjdk.java.net Tue Feb 8 20:22:10 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 20:22:10 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: On Tue, 8 Feb 2022 12:08:51 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into append-invariant >> - minor comment fixes >> - append invariant > > Not part of this PR, but we need to add a comment about `push/append` being susceptible to ABA behavior as discovered in JDK-8273383. Thanks @walulyai and @tschatzl for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/7250 From kbarrett at openjdk.java.net Tue Feb 8 20:22:11 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 20:22:11 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v2] In-Reply-To: References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: On Tue, 8 Feb 2022 13:12:40 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into append-invariant >> - minor comment fixes >> - append invariant > > src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 200: > >> 198: // cmpxchg indicates a concurrent operation updated _head first. That >> 199: // could be either a push/append or a try_pop in [Clause 1b]. >> 200: Atomic::cmpxchg(&_head, result, (T*)NULL); > > These `NULL`s could be replaced by `nullptr`. I'm planning to do that, since this code has been going through a lot of recent churn anyway. But I didn't want to mix that cleanup with functional changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/7250 From kbarrett at openjdk.java.net Tue Feb 8 20:32:45 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 20:32:45 GMT Subject: RFR: 8280828: Improve invariants in NonblockingQueue::append [v3] In-Reply-To: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: > Please review this change to NonblockingQueue to improve invariants in the > append operation by making a change in try_pop. > > When taking the last entry in the queue, try_pop needs to do some cleanup of > the queue fields, setting them to NULL. The order of those cleanups doesn't > matter for correctness. However, setting first _head then _tail permits > append to assert that _head is NULL when it finds _tail was NULL. The current > order (set _tail first, then _head) doesn't permit such an assertion. > > Testing: > mach5 tier1-3 > > I also did lots of testing with this change included while investigating > JDK-8273383. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into append-invariant - Merge branch 'master' into append-invariant - minor comment fixes - append invariant ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7250/files - new: https://git.openjdk.java.net/jdk/pull/7250/files/9648d183..89b8f300 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7250&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7250&range=01-02 Stats: 379 lines in 11 files changed: 325 ins; 30 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/7250.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7250/head:pull/7250 PR: https://git.openjdk.java.net/jdk/pull/7250 From kbarrett at openjdk.java.net Tue Feb 8 20:32:47 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 20:32:47 GMT Subject: Integrated: 8280828: Improve invariants in NonblockingQueue::append In-Reply-To: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> References: <5RadwfEH_n0x_cLSSZePRdiQ5W6nRhfNJ_ns3ajDZtQ=.6a3a1eaa-55b3-4154-a5a3-6e2985a1ceaf@github.com> Message-ID: On Thu, 27 Jan 2022 20:34:29 GMT, Kim Barrett wrote: > Please review this change to NonblockingQueue to improve invariants in the > append operation by making a change in try_pop. > > When taking the last entry in the queue, try_pop needs to do some cleanup of > the queue fields, setting them to NULL. The order of those cleanups doesn't > matter for correctness. However, setting first _head then _tail permits > append to assert that _head is NULL when it finds _tail was NULL. The current > order (set _tail first, then _head) doesn't permit such an assertion. > > Testing: > mach5 tier1-3 > > I also did lots of testing with this change included while investigating > JDK-8273383. This pull request has now been integrated. Changeset: d658d945 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/d658d945cf57bab8e61302841dcb56b36e48eff3 Stats: 45 lines in 1 file changed: 19 ins; 6 del; 20 mod 8280828: Improve invariants in NonblockingQueue::append Reviewed-by: iwalulya, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/7250 From kbarrett at openjdk.java.net Tue Feb 8 23:05:27 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 23:05:27 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue Message-ID: Please review this update of the usage and implementation comments for NonblockingQueue to discuss the ABA issue in push/append operations. ------------- Commit messages: - document ABA for push/append Changes: https://git.openjdk.java.net/jdk/pull/7393/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8280832 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7393.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7393/head:pull/7393 PR: https://git.openjdk.java.net/jdk/pull/7393 From kbarrett at openjdk.java.net Tue Feb 8 23:33:26 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 8 Feb 2022 23:33:26 GMT Subject: RFR: 8280830: Change NonblockingQueue::try_pop variable named "result" Message-ID: <-k-4lPQNjY4Xx-eT0djBmXpv3uwWViUQ82pPO2qTbzA=.1e6c2826-5975-4777-a43c-0a43c8b7bf9b@github.com> Please review this trivial change to rename a variable in NonblockingQueue::try_pop. The variable named "result" is being renamed to "old_head", as the old name was found to be confusing by some people, making the code harder to read. Testing: local build. ------------- Commit messages: - change variable name Changes: https://git.openjdk.java.net/jdk/pull/7394/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7394&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8280830 Stats: 24 lines in 1 file changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/7394.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7394/head:pull/7394 PR: https://git.openjdk.java.net/jdk/pull/7394 From dholmes at openjdk.java.net Wed Feb 9 00:37:10 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 9 Feb 2022 00:37:10 GMT Subject: RFR: 8280830: Change NonblockingQueue::try_pop variable named "result" In-Reply-To: <-k-4lPQNjY4Xx-eT0djBmXpv3uwWViUQ82pPO2qTbzA=.1e6c2826-5975-4777-a43c-0a43c8b7bf9b@github.com> References: <-k-4lPQNjY4Xx-eT0djBmXpv3uwWViUQ82pPO2qTbzA=.1e6c2826-5975-4777-a43c-0a43c8b7bf9b@github.com> Message-ID: On Tue, 8 Feb 2022 23:27:41 GMT, Kim Barrett wrote: > Please review this trivial change to rename a variable in > NonblockingQueue::try_pop. The variable named "result" is being renamed to > "old_head", as the old name was found to be confusing by some people, making > the code harder to read. > > Testing: > local build. Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7394 From dholmes at openjdk.java.net Wed Feb 9 00:43:03 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 9 Feb 2022 00:43:03 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 22:58:26 GMT, Kim Barrett wrote: > Please review this update of the usage and implementation comments for > NonblockingQueue to discuss the ABA issue in push/append operations. src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 122: > 120: // try_pop could take old_tail before our update, it gets recycled and > 121: // re-added to the end, and then we successfully cmpxchg, rendering the > 122: // list in _tail circular. Doesn't this contradict the "We won any races with try_pop ... so we're done"! ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From Divino.Cesar at microsoft.com Wed Feb 9 01:45:57 2022 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Wed, 9 Feb 2022 01:45:57 +0000 Subject: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement Message-ID: Hi there again! Can you please give me feedback on the following approach to at least partially address [1], the scalar replacement allocation merge issue? The problem that I am trying to solve arises when allocations are merged after a control flow split. The code below shows _one example_ of such a situation. public int ex1(boolean cond, int x, int y) { ? ? Point p = new Point(x, y); ? ? if (cond) ? ? ? ? p = new Point(y, x); ? ? // Allocations for p are merged here. ? ? return p.calc(); } Assuming the method calls on "p" are inlined then the allocations will not escape the method. The C2 IR for this method will look like this: public int ex1(boolean cond, int first, int second) { ? ? p0 = Allocate(...); ? ? ... ? ? p0.x = first; ? ? p0.y = second; ? ? if (cond) { ? ? ? ? p1 = Allocate(...); ? ? ? ? ... ? ? ? ? p1.x = second; ? ? ? ? p1.y = first; ? ? } ? ? p = phi(p0, p1) ? ? return p.x - p.y; } However, one of the constraints implemented here [2], specifically the third one, will prevent the objects from being scalar replaced. ? The approach that I'm considering for solving the problem is to replace the Phi node `p = phi(p0, p1)` with new Phi nodes for each of the fields of the objects in the original Phi. The IR for `ex1` would look something like this after the transformation: public int ex1(boolean cond, int first, int second) { ? ? p0 = Allocate(...); ? ? ... ? ? p0.x = first; ? ? p0.y = second; ? ? if (cond) { ? ? ? ? p1 = Allocate(...); ? ? ? ? ... ? ? ? ? p1.x = second; ? ? ? ? p1.y = first; ? ? } ? ? pX = phi(first, second) ? ? pY = phi(second, first) ? ? return pX - pY; } I understand that this transformation might not be applicable for all cases and that it's not as simple as illustrated above. Also, it seems to me that much of what I'd have to implement is already implemented in other steps of the Scalar Replacement pipeline (which is a good thing). To work around these implementation details I plan to use as much of the existing code as possible. The algorithm for the transformation would be like this: split_phis(phi) ? ? # If output of phi escapes, or something uses its identity, etc ? ? # then we can't remove it. The conditions here might possible be the ? ? # same as the ones implemented in `PhaseMacroExpand::can_eliminate_allocation` ? ? if cant_remove_phi_output(phi) ? ? ? ? return ; ? ? # Collect a set of tuples(F,U) containing nodes U that uses field F ? ? # member of the object resulting from `phi`. ? ? fields_used = collect_fields_used_after_phi(phi) ? ? foreach field in fields_used ? ? ? ? producers = {} ? ? ? ? # Create a list with the last Store for each field "field" on the ? ? ? ? # scope of each of the Phi input objects. ? ? ? ? foreach o in phi.inputs ? ? ? ? ? ? # The function called below might re-use a lot of the code/logic in `PhaseMacroExpand::scalar_replacement` ? ? ? ? ? ? producers += last_store_to_o_field(0, field) ? ? ? ? ? ? ? ? # Create a new phi node whose inputs are the Store's to 'field' ? ? ? ? field_phi = create_new_phi(producers) ? ? ? ? update_consumers(field, field_phi) The implementation that I envisioned would be as a "pre-process" [3] step just after EA but before the constraint checks in `adjust_scalar_replaceable_state` [2]. If we agree that the overall Scalar Replacement implementation goes through the following major phases: ? ? 1. Identify the Escape Status of objects. ? ? 2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints. ? ? 3. Make call to Split_unique_types [4]. ? ? 4 Iterate over object and array allocations. ? ? ? ? 4.1 Check if allocation can be eliminated. ? ? ? ? ? 4.2 Perform scalar replacement. Replace uses of object in Safepoints. ? ? ? ? 4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X. The transformation that I am proposing would change the overall flow to look like this: ? ? 1. Identify the Escape Status of objects. ? ? 2. ----> New: "Split phi functions" <---- ? ? 2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints. ? ? 3. Make call to Split_unique_types [14]. ? ? 4 Iterate over object and array allocations. ? ? ? ? 4.1 ----> Moved to split_phi: "Check if allocation can be eliminated" <---- ? ? ? ? 4.2 Perform scalar replacement. Replace uses of object in Safepoints. ? ? ? ? 4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X. Please let me know what you think and thank you for taking the time to review this! Regards, Cesar Notes: ? ? [1] I am not sure yet how this approach will play with the case of a merge ? ? ? ? with NULL. ? ? ? [2] https://github.com/openjdk/jdk/blob/2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff/src/hotspot/share/opto/escape.cpp#L1809 ? ? [3] Another option would be to "patch" the current implementation to be able ? ? ? ? to handle the merges. I am not certain that the "patch" approach would be ? ? ? ? better, however, the "pre-process" approach is certainly much easier to test ? ? ? ? and more readable. ? ? [4] I cannot say I understand 100% the effects of executing ? ? ? ? split_unique_types(). Would the transformation that I am proposing need to ? ? ? ? be after the call to split_unique_types? From kbarrett at openjdk.java.net Wed Feb 9 04:14:11 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 9 Feb 2022 04:14:11 GMT Subject: RFR: 8280830: Change NonblockingQueue::try_pop variable named "result" In-Reply-To: References: <-k-4lPQNjY4Xx-eT0djBmXpv3uwWViUQ82pPO2qTbzA=.1e6c2826-5975-4777-a43c-0a43c8b7bf9b@github.com> Message-ID: <2XNE2sqbmYSMG2SFhrYR-omuAIpbnJZp8-yRCoyNdEA=.f60f5cfd-3481-4d20-944f-f91246ed5d2f@github.com> On Wed, 9 Feb 2022 00:33:55 GMT, David Holmes wrote: >> Please review this trivial change to rename a variable in >> NonblockingQueue::try_pop. The variable named "result" is being renamed to >> "old_head", as the old name was found to be confusing by some people, making >> the code harder to read. >> >> Testing: >> local build. > > Looks good and trivial. > > Thanks, > David Thanks @dholmes-ora for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/7394 From kbarrett at openjdk.java.net Wed Feb 9 04:14:11 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 9 Feb 2022 04:14:11 GMT Subject: Integrated: 8280830: Change NonblockingQueue::try_pop variable named "result" In-Reply-To: <-k-4lPQNjY4Xx-eT0djBmXpv3uwWViUQ82pPO2qTbzA=.1e6c2826-5975-4777-a43c-0a43c8b7bf9b@github.com> References: <-k-4lPQNjY4Xx-eT0djBmXpv3uwWViUQ82pPO2qTbzA=.1e6c2826-5975-4777-a43c-0a43c8b7bf9b@github.com> Message-ID: On Tue, 8 Feb 2022 23:27:41 GMT, Kim Barrett wrote: > Please review this trivial change to rename a variable in > NonblockingQueue::try_pop. The variable named "result" is being renamed to > "old_head", as the old name was found to be confusing by some people, making > the code harder to read. > > Testing: > local build. This pull request has now been integrated. Changeset: 13f739d3 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/13f739d330e393f840d134f5327a025957e1f795 Stats: 24 lines in 1 file changed: 0 ins; 0 del; 24 mod 8280830: Change NonblockingQueue::try_pop variable named "result" Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/7394 From kbarrett at openjdk.java.net Wed Feb 9 04:20:10 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 9 Feb 2022 04:20:10 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue In-Reply-To: References: Message-ID: On Wed, 9 Feb 2022 00:39:34 GMT, David Holmes wrote: >> Please review this update of the usage and implementation comments for >> NonblockingQueue to discuss the ABA issue in push/append operations. > > src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 122: > >> 120: // try_pop could take old_tail before our update, it gets recycled and >> 121: // re-added to the end, and then we successfully cmpxchg, rendering the >> 122: // list in _tail circular. > > Doesn't this contradict the "We won any races with try_pop ... so we're done"! The client of this class is expected to prevent ABA from occurring. Some of the mechanisms that might be used for doing so include separate phases for push/pop and preventing recycling while some thread might be in the midst of one of the problem operations. The only current user of this class is G1DirtyCardQueueSet, where a combination of GlobalCounter critical sections and safepoint boundaries are used to ensure ABA can't happen. ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From shade at openjdk.java.net Wed Feb 9 06:55:09 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 9 Feb 2022 06:55:09 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: <5tnuK3pwhbOWk8dJlEkELJoxEFhmDyZFwpG5DfkozQ4=.b3cad787-bb0d-4716-91ed-079669da8eb0@github.com> On Tue, 8 Feb 2022 18:39:07 GMT, Harold Seigel wrote: >> I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. >> >> Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. >> >> The default values for options are different per platform, so tests are x86_64 specific. >> >> No default value is changed, this only unblocks experiments. >> >> Additional testing: >> - [x] New tests on Linux x86_64 fastdebug >> - [x] New tests on Linux x86_64 release > > src/hotspot/share/runtime/globals.hpp line 1539: > >> 1537: range(1, 128) \ >> 1538: constraint(OptoLoopAlignmentConstraintFunc, AfterErgo) \ >> 1539: \ > > Should OptoLoopAlignment be an int, instead of an intx, since its range is small? Dunno, maybe? I see the lot of other "small" options are `intx`, and the change like that would proliferate to all architectures that set `OptoLoopAlignment` as their `product_pd`. It also raises the question if `CodeEntryAlignment` should also be `int`? I'd rather keep this patch small, to be honest. ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From iwalulya at openjdk.java.net Wed Feb 9 10:14:06 2022 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 9 Feb 2022 10:14:06 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 22:58:26 GMT, Kim Barrett wrote: > Please review this update of the usage and implementation comments for > NonblockingQueue to discuss the ABA issue in push/append operations. Minor suggestion src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 120: > 118: // old_tail for extension. We won any races with try_pop by changing > 119: // away from end-marker. So we're done. Note that ABA is possible; > 120: // try_pop could take old_tail before our update, it gets recycled and "a concurrent try_pop could take...." ------------- Marked as reviewed by iwalulya (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7393 From duke at openjdk.java.net Wed Feb 9 13:22:06 2022 From: duke at openjdk.java.net (Bhavana-Kilambi) Date: Wed, 9 Feb 2022 13:22:06 GMT Subject: Integrated: 8280007: Enable Neoverse N1 optimizations for Arm Neoverse V1 & N2 In-Reply-To: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> References: <5-WQEPc2lrSR_d0pVtsoFDT45Je1TJtJAdxAiBbEc9U=.6adf8246-dd10-4518-bd91-67e6fdd6eed9@github.com> Message-ID: On Tue, 8 Feb 2022 15:33:20 GMT, Bhavana-Kilambi wrote: > As Arm Neoverse V1 and N2s will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for both V1 and N2 architectures. This pull request has now been integrated. Changeset: f823bed0 Author: Bhavana Kilambi Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/f823bed043dc38d838baaf8c2024ef24b8a50e9b Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod 8280007: Enable Neoverse N1 optimizations for Arm Neoverse V1 & N2 Reviewed-by: phh ------------- PR: https://git.openjdk.java.net/jdk/pull/7383 From redestad at openjdk.java.net Wed Feb 9 14:06:48 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 9 Feb 2022 14:06:48 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives Message-ID: I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). ------------- Commit messages: - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral - Simplify changes to encodeUTF8 - Fix little-endian error caught by testing - Reduce jumps in the ascii path - Remove unused tail_mask - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) - Add more comments, simplify tail branching in AVX512 variant - Resolve issues in the precise implementation - Add shortMixed micros, cleanups - Adjust the countPositives intrinsic to count the bytes exactly. - ... and 11 more: https://git.openjdk.java.net/jdk/compare/cab59051...2a855eb6 Changes: https://git.openjdk.java.net/jdk/pull/7231/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281146 Stats: 806 lines in 24 files changed: 586 ins; 84 del; 136 mod Patch: https://git.openjdk.java.net/jdk/pull/7231.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231 PR: https://git.openjdk.java.net/jdk/pull/7231 From duke at openjdk.java.net Wed Feb 9 15:44:59 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 15:44:59 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jan 2022 01:10:34 GMT, David Holmes wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8278423' of https://github.com/eme64/jdk into JDK-8278423 >> - added flag to VMDeprecatedOptions Test > > src/hotspot/os/aix/attachListener_aix.cpp line 31: > >> 29: #include "runtime/os.inline.hpp" >> 30: #include "services/attachListener.hpp" >> 31: > > These changes are somewhat independent of the deprecation issue and could be split out into a separate RFE. The serviceability folk may have an opinion. @dholmes-ora Ok, I reverted this and will do it in a separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Wed Feb 9 15:44:58 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 15:44:58 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v5] In-Reply-To: References: Message-ID: > Deprecated ExtendedDTraceProbes. > Edited help messages and man pages accordingly. > Added flag to VMDeprecatedOptions test. > > Checked that tests are not affected. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Revert "removed file with declarations that are never defined or used: /src/hotspot/share/services/dtraceAttacher.hpp" This reverts commit 885b985bb3618fc621cac1a32159b5449b5026fb. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7110/files - new: https://git.openjdk.java.net/jdk/pull/7110/files/0f161b01..7a93ecae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=03-04 Stats: 55 lines in 5 files changed: 55 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7110.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7110/head:pull/7110 PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Wed Feb 9 15:54:47 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 15:54:47 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v6] In-Reply-To: References: Message-ID: <1aigQ1zndvru8IqC1RgnZ8mGZecF4kaxUi2039maPPQ=.59745cd8-5be3-4b2a-8976-42ac25404987@github.com> > Deprecated ExtendedDTraceProbes. > Edited help messages and man pages accordingly. > Added flag to VMDeprecatedOptions test. > > Checked that tests are not affected. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: replaced with 3 flags in test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7110/files - new: https://git.openjdk.java.net/jdk/pull/7110/files/7a93ecae..be3e0b81 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=04-05 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7110.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7110/head:pull/7110 PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Wed Feb 9 15:54:50 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 15:54:50 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v4] In-Reply-To: References: Message-ID: On Tue, 18 Jan 2022 18:42:27 GMT, Harold Seigel wrote: > Can you replace the use of -XX:+ExtendedDTraceProbes in test/hotspot/jtreg/serviceability/7170638/SDTProbesGNULinuxTest.java with the three new flags ? Thank you @hseigel , I patched it according to your suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Wed Feb 9 16:10:10 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 16:10:10 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jan 2022 01:12:16 GMT, David Holmes wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> moved deprecated flag to deprecated section in manpages > > src/hotspot/share/runtime/arguments.cpp line 2884: > >> 2882: #if defined(DTRACE_ENABLED) >> 2883: warning("Option ExtendedDTraceProbes was deprecated in version 19 and will likely be removed in a future release."); >> 2884: warning("Use a combination of -XX:+DTraceMethodProbes, -XX:+DTraceAllocProbes and -XX:+DTraceMonitorProbes instead."); > > s/a/the/ > > Applies to all three uses. Agreed, changing it ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Wed Feb 9 16:20:49 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 16:20:49 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v7] In-Reply-To: References: Message-ID: > Deprecated ExtendedDTraceProbes. > Edited help messages and man pages accordingly. > Added flag to VMDeprecatedOptions test. > > Checked that tests are not affected. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: updated warning messages and added 3 flags to man-pages ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7110/files - new: https://git.openjdk.java.net/jdk/pull/7110/files/be3e0b81..b05ecfa2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=05-06 Stats: 18 lines in 2 files changed: 16 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7110.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7110/head:pull/7110 PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Wed Feb 9 16:20:51 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Wed, 9 Feb 2022 16:20:51 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v4] In-Reply-To: References: Message-ID: On Wed, 19 Jan 2022 01:17:21 GMT, David Holmes wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> moved deprecated flag to deprecated section in manpages > > src/java.base/share/man/java.1 line 4001: > >> 3999: .TP >> 4000: .B \f[CB]\-XX:+ExtendedDTraceProbes\f[R] >> 4001: Deprecated. Use combination of these flags instead: -XX:+DTraceMethodProbes, -XX:+DTraceAllocProbes, -XX:+DTraceMonitorProbes > > Delete "Deprecated" as we are in the deprecated options section. > > The wording also needs updating as per the warning text ... though that might read a little odd here so I suggest a tweak: > > Use the combination of -XX:+DTraceMethodProbes, -XX:+DTraceAllocProbes and -XX:+DTraceMonitorProbes instead of this deprecated flag. > > > I would also move that new text to the end, so we still describe the flag first (otherwise it again reads a little odd.) > > We will also need to add those flags to the "ADVANCED SERVICEABILITY OPTIONS FOR JAVA" section. @dholmes-ora thanks for the suggestions, I implemented them. ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From mdoerr at openjdk.java.net Wed Feb 9 18:08:12 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 9 Feb 2022 18:08:12 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 17:24:41 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Show watermark in better place on the chart LGTM. And a step into the right direction IMHO. We should check the code on other platforms, too (separately is ok). ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7247 From hseigel at openjdk.java.net Wed Feb 9 18:51:10 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 9 Feb 2022 18:51:10 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v7] In-Reply-To: References: Message-ID: On Wed, 9 Feb 2022 16:20:49 GMT, Emanuel Peter wrote: >> Deprecated ExtendedDTraceProbes. >> Edited help messages and man pages accordingly, added the 3 flags to man pages. >> Added flag to VMDeprecatedOptions test. >> Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. >> >> Checked that tests are not affected. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > updated warning messages and added 3 flags to man-pages Other than the need to update the copyright dates to 2022, these changes look good. Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7110 From hseigel at openjdk.java.net Wed Feb 9 18:58:05 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 9 Feb 2022 18:58:05 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: <5tnuK3pwhbOWk8dJlEkELJoxEFhmDyZFwpG5DfkozQ4=.b3cad787-bb0d-4716-91ed-079669da8eb0@github.com> References: <5tnuK3pwhbOWk8dJlEkELJoxEFhmDyZFwpG5DfkozQ4=.b3cad787-bb0d-4716-91ed-079669da8eb0@github.com> Message-ID: <4k5B_eeCIPWe4rTYueR7n0lixNRMFzItoV9U7lCfIbM=.ada0c192-7ca6-4d4b-bdcb-a912e7867aa5@github.com> On Wed, 9 Feb 2022 06:51:47 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/runtime/globals.hpp line 1539: >> >>> 1537: range(1, 128) \ >>> 1538: constraint(OptoLoopAlignmentConstraintFunc, AfterErgo) \ >>> 1539: \ >> >> Should OptoLoopAlignment be an int, instead of an intx, since its range is small? > > Dunno, maybe? I see the lot of other "small" options are `intx`, and the change like that would proliferate to all architectures that set `OptoLoopAlignment` as their `product_pd`. It also raises the question if `CodeEntryAlignment` should also be `int`? I'd rather keep this patch small, to be honest. Your comment makes sense. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From iwalulya at openjdk.java.net Wed Feb 9 19:10:05 2022 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 9 Feb 2022 19:10:05 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 18 Jan 2022 12:03:46 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 lgtm! ------------- Marked as reviewed by iwalulya (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7124 From sviswanathan at openjdk.java.net Wed Feb 9 23:27:07 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 9 Feb 2022 23:27:07 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts In-Reply-To: References: Message-ID: On Sat, 5 Feb 2022 15:34:08 GMT, Quan Anh Mai wrote: > Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. > > Thank you very much. src/hotspot/cpu/x86/assembler_x86.cpp line 4782: > 4780: vector_len == AVX_256bit? VM_Version::supports_avx2() : > 4781: vector_len == AVX_512bit? VM_Version::supports_evex() : 0, " "); > 4782: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ true); legacy_mode should be false here instead of _legacy_mode_bw. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From kbarrett at openjdk.java.net Thu Feb 10 03:14:05 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 03:14:05 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 18 Jan 2022 12:03:46 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 I'm not keen on the suggested new name. I want to read ParallelExpandHeap_lock as a lock for parallel expansion of the heap, which doesn't really have the right flavor of subsystem ownership. I think (with this change) its uses are limited to ParallelGC oldgen expansion, suggesting a name like PSOldGenExpand_lock (or PSOldGen::Expand_lock, which could be private were it not for the assert in MutableSpace and the "usual" practice of defining locks in mutexLocker.[ch]pp). Other than that naming issue, nice cleanup. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From kbarrett at openjdk.java.net Thu Feb 10 05:54:43 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 05:54:43 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v2] In-Reply-To: References: Message-ID: <5kzE-8-k1Npd6wlCu_kc2xyP1NyC7iJQA4ymcl5O1Ac=.0dd5114e-d4a2-4c39-970d-93995239323b@github.com> > Please review this update of the usage and implementation comments for > NonblockingQueue to discuss the ABA issue in push/append operations. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: walulyai review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7393/files - new: https://git.openjdk.java.net/jdk/pull/7393/files/8c6593dc..fe7cc130 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7393.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7393/head:pull/7393 PR: https://git.openjdk.java.net/jdk/pull/7393 From kbarrett at openjdk.java.net Thu Feb 10 05:54:45 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 05:54:45 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v2] In-Reply-To: References: Message-ID: <-r7rCiSfhFsXpQqg-m1qPuMoPeFkcM92eBdAkWz7C40=.4efb33b4-1503-4edc-8635-8c0fb23db64b@github.com> On Wed, 9 Feb 2022 10:10:17 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> walulyai review > > src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 120: > >> 118: // old_tail for extension. We won any races with try_pop by changing >> 119: // away from end-marker. So we're done. Note that ABA is possible; >> 120: // try_pop could take old_tail before our update, it gets recycled and > > "a concurrent try_pop could take...." Sure, I can make that explicit. ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From dholmes at openjdk.java.net Thu Feb 10 06:22:15 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 06:22:15 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v7] In-Reply-To: References: Message-ID: On Wed, 9 Feb 2022 16:20:49 GMT, Emanuel Peter wrote: >> Deprecated ExtendedDTraceProbes. >> Edited help messages and man pages accordingly, added the 3 flags to man pages. >> Added flag to VMDeprecatedOptions test. >> Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. >> >> Checked that tests are not affected. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > updated warning messages and added 3 flags to man-pages Hi Emanuel, A few minor nits below. Thanks, David src/hotspot/share/runtime/globals.hpp line 1868: > 1866: product(bool, ExtendedDTraceProbes, false, \ > 1867: "(Deprecated) Enable performance-impacting dtrace probes. " \ > 1868: "Use a combination of -XX:+DTraceMethodProbes, " \ You missed changing 'a' to 'the' here. src/java.base/share/man/java.1 line 2977: > 2975: .RE > 2976: .TP > 2977: .B \f[CB]\-XX:+DTraceAllocProbes\f[R] The three newly documented flags should all be marked "Linux and macOS". It is somewhat of a poor design that the flags are available on all platforms but only have an effect on systems with DTrace or SystemTap support - which (for our main platforms) is Linux and macOS. src/java.base/share/man/java.1 line 4017: > 4015: .B \f[CB]\-XX:+ExtendedDTraceProbes\f[R] > 4016: \f[B]Linux and macOS:\f[R] Enables additional \f[CB]dtrace\f[R] tool probes > 4017: that affect the performance. Existing grammatical nit: please delete 'the'. src/java.base/share/man/java.1 line 4020: > 4018: By default, this option is disabled and \f[CB]dtrace\f[R] performs only > 4019: standard probes. > 4020: Use the combination of these flags instead: -XX:+DTraceMethodProbes, -XX:+DTraceAllocProbes, -XX:+DTraceMonitorProbes The flags should be in a code font (use `-XX:...` in the markdown source). ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From dholmes at openjdk.java.net Thu Feb 10 06:29:09 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 06:29:09 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v2] In-Reply-To: References: Message-ID: On Wed, 9 Feb 2022 04:16:33 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 122: >> >>> 120: // try_pop could take old_tail before our update, it gets recycled and >>> 121: // re-added to the end, and then we successfully cmpxchg, rendering the >>> 122: // list in _tail circular. >> >> Doesn't this contradict the "We won any races with try_pop ... so we're done"! > > The client of this class is expected to prevent ABA from occurring. Some of the mechanisms that might be used for doing so include separate phases for push/pop and preventing recycling while some thread might be in the midst of one of the problem operations. The only current user of this class is G1DirtyCardQueueSet, where a combination of GlobalCounter critical sections and safepoint boundaries are used to ensure ABA can't happen. Understood, but it still, to me, reads oddly to claim "we're done" and then have an ABA disclaimer. ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From dholmes at openjdk.java.net Thu Feb 10 06:54:07 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 06:54:07 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v2] In-Reply-To: <5kzE-8-k1Npd6wlCu_kc2xyP1NyC7iJQA4ymcl5O1Ac=.0dd5114e-d4a2-4c39-970d-93995239323b@github.com> References: <5kzE-8-k1Npd6wlCu_kc2xyP1NyC7iJQA4ymcl5O1Ac=.0dd5114e-d4a2-4c39-970d-93995239323b@github.com> Message-ID: On Thu, 10 Feb 2022 05:54:43 GMT, Kim Barrett wrote: >> Please review this update of the usage and implementation comments for >> NonblockingQueue to discuss the ABA issue in push/append operations. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > walulyai review src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 122: > 120: // concurrent try_pop could take old_tail before our update, it gets > 121: // recycled and re-added to the end, and then we successfully cmpxchg, > 122: // rendering the list in _tail circular. Suggestions: 1. start the new comment on a new line 2. "Note that ABA would be possible if a concurrent try_pop takes old_tail before our update, ... Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From shade at openjdk.java.net Thu Feb 10 08:43:07 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Feb 2022 08:43:07 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 17:24:41 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Show watermark in better place on the chart All right, thanks for reviews! Last call for comments. I am planning to integrate it later today. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From duke at openjdk.java.net Thu Feb 10 08:46:50 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Thu, 10 Feb 2022 08:46:50 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v8] In-Reply-To: References: Message-ID: <6_ddanyI-FFaerYCGBHYYGlJQZpUvypaIIoPOq6S3wM=.b77c72f2-e29a-4d31-826c-f42c737978d1@github.com> > Deprecated ExtendedDTraceProbes. > Edited help messages and man pages accordingly, added the 3 flags to man pages. > Added flag to VMDeprecatedOptions test. > Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. > > Checked that tests are not affected. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fixes to documentation requested by reviewers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7110/files - new: https://git.openjdk.java.net/jdk/pull/7110/files/b05ecfa2..af11b456 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=06-07 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/7110.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7110/head:pull/7110 PR: https://git.openjdk.java.net/jdk/pull/7110 From duke at openjdk.java.net Thu Feb 10 08:46:55 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Thu, 10 Feb 2022 08:46:55 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v7] In-Reply-To: References: Message-ID: <2wP_TEQJtQvj6ALRgl4BG_QnXkLVqBntoZleXWDIDaU=.ccd0ae17-7ed8-4ec5-955b-476941ab37b9@github.com> On Thu, 10 Feb 2022 05:54:17 GMT, David Holmes wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> updated warning messages and added 3 flags to man-pages > > src/hotspot/share/runtime/globals.hpp line 1868: > >> 1866: product(bool, ExtendedDTraceProbes, false, \ >> 1867: "(Deprecated) Enable performance-impacting dtrace probes. " \ >> 1868: "Use a combination of -XX:+DTraceMethodProbes, " \ > > You missed changing 'a' to 'the' here. done > src/java.base/share/man/java.1 line 2977: > >> 2975: .RE >> 2976: .TP >> 2977: .B \f[CB]\-XX:+DTraceAllocProbes\f[R] > > The three newly documented flags should all be marked "Linux and macOS". It is somewhat of a poor design that the flags are available on all platforms but only have an effect on systems with DTrace or SystemTap support - which (for our main platforms) is Linux and macOS. done > src/java.base/share/man/java.1 line 4017: > >> 4015: .B \f[CB]\-XX:+ExtendedDTraceProbes\f[R] >> 4016: \f[B]Linux and macOS:\f[R] Enables additional \f[CB]dtrace\f[R] tool probes >> 4017: that affect the performance. > > Existing grammatical nit: please delete 'the'. done > src/java.base/share/man/java.1 line 4020: > >> 4018: By default, this option is disabled and \f[CB]dtrace\f[R] performs only >> 4019: standard probes. >> 4020: Use the combination of these flags instead: -XX:+DTraceMethodProbes, -XX:+DTraceAllocProbes, -XX:+DTraceMonitorProbes > > The flags should be in a code font (use `-XX:...` in the markdown source). done ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From aph at openjdk.java.net Thu Feb 10 09:57:16 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 10 Feb 2022 09:57:16 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: <6jyO3IUay3bgBazoCWsk_1BGQPaWvFHmqFFlYi6lH8k=.cb8fc84f-c9a4-4b35-be0c-7e2be7ede815@github.com> On Thu, 10 Feb 2022 08:40:11 GMT, Aleksey Shipilev wrote: > All right, thanks for reviews! Last call for comments. I am planning to integrate it later today. x86-32 has some weird stack handling, particularly when using the invocation interface. I guess we assume our regression tests will catch breakage there. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Thu Feb 10 10:12:13 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 10 Feb 2022 10:12:13 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: <6jyO3IUay3bgBazoCWsk_1BGQPaWvFHmqFFlYi6lH8k=.cb8fc84f-c9a4-4b35-be0c-7e2be7ede815@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> <6jyO3IUay3bgBazoCWsk_1BGQPaWvFHmqFFlYi6lH8k=.cb8fc84f-c9a4-4b35-be0c-7e2be7ede815@github.com> Message-ID: <-WugBPZ3_skHnoOjDca2NP6leCXqX8UuMKsSK6kfIic=.e4f07c8e-6ac0-4bc7-a24b-72afadc29b9b@github.com> On Thu, 10 Feb 2022 09:54:01 GMT, Andrew Haley wrote: > > All right, thanks for reviews! Last call for comments. I am planning to integrate it later today. > > x86-32 has some weird stack handling, particularly when using the invocation interface. I guess we assume our regression tests will catch breakage there. As you can see in "Additional testing", I ran `tier{1,2,3}` on x86_32 without problems. It is hard to tell how this patch would break x86_32 though: it would still bang the same way when close to guard zone. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From kbarrett at openjdk.java.net Thu Feb 10 10:12:44 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 10:12:44 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v3] In-Reply-To: References: Message-ID: > Please review this update of the usage and implementation comments for > NonblockingQueue to discuss the ABA issue in push/append operations. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: dholmes review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7393/files - new: https://git.openjdk.java.net/jdk/pull/7393/files/fe7cc130..7aacc083 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=01-02 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7393.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7393/head:pull/7393 PR: https://git.openjdk.java.net/jdk/pull/7393 From kbarrett at openjdk.java.net Thu Feb 10 10:12:47 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 10:12:47 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v2] In-Reply-To: References: <5kzE-8-k1Npd6wlCu_kc2xyP1NyC7iJQA4ymcl5O1Ac=.0dd5114e-d4a2-4c39-970d-93995239323b@github.com> Message-ID: On Thu, 10 Feb 2022 06:50:42 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> walulyai review > > src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 122: > >> 120: // concurrent try_pop could take old_tail before our update, it gets >> 121: // recycled and re-added to the end, and then we successfully cmpxchg, >> 122: // rendering the list in _tail circular. > > Suggestions: > > 1. start the new comment on a new line > 2. "Note that ABA would be possible if a concurrent try_pop takes old_tail before our update, ... > > Cheers, > David OK, I separated the note and reworded it a bit. Better? ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From dholmes at openjdk.java.net Thu Feb 10 10:50:11 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 10:50:11 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v8] In-Reply-To: <6_ddanyI-FFaerYCGBHYYGlJQZpUvypaIIoPOq6S3wM=.b77c72f2-e29a-4d31-826c-f42c737978d1@github.com> References: <6_ddanyI-FFaerYCGBHYYGlJQZpUvypaIIoPOq6S3wM=.b77c72f2-e29a-4d31-826c-f42c737978d1@github.com> Message-ID: On Thu, 10 Feb 2022 08:46:50 GMT, Emanuel Peter wrote: >> Deprecated ExtendedDTraceProbes. >> Edited help messages and man pages accordingly, added the 3 flags to man pages. >> Added flag to VMDeprecatedOptions test. >> Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. >> >> Checked that tests are not affected. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fixes to documentation requested by reviewers Thanks for the updates. David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7110 From dholmes at openjdk.java.net Thu Feb 10 10:55:05 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 10:55:05 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v3] In-Reply-To: References: Message-ID: <48Qi6eikXOhu71mgUEc3PE-CYTHJJTpZxOVzbl6vPtQ=.66340aa2-f28c-4569-92c7-993ca4d78b1b@github.com> On Thu, 10 Feb 2022 10:12:44 GMT, Kim Barrett wrote: >> Please review this update of the usage and implementation comments for >> NonblockingQueue to discuss the ABA issue in push/append operations. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > dholmes review Thanks - reads well. David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7393 From kbarrett at openjdk.java.net Thu Feb 10 11:31:41 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 11:31:41 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v4] In-Reply-To: References: Message-ID: > Please review this update of the usage and implementation comments for > NonblockingQueue to discuss the ABA issue in push/append operations. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into nbq-aba - dholmes review - walulyai review - document ABA for push/append ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7393/files - new: https://git.openjdk.java.net/jdk/pull/7393/files/7aacc083..4f03fd58 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7393&range=02-03 Stats: 3101 lines in 57 files changed: 2579 ins; 211 del; 311 mod Patch: https://git.openjdk.java.net/jdk/pull/7393.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7393/head:pull/7393 PR: https://git.openjdk.java.net/jdk/pull/7393 From kbarrett at openjdk.java.net Thu Feb 10 11:31:43 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 11:31:43 GMT Subject: RFR: 8280832: Update usage docs for NonblockingQueue [v4] In-Reply-To: References: Message-ID: <6RaGYU4SOe70MtRx7l7SoqoaqMOv0KTrp__xCI97GX4=.34730bb9-58c4-4d6d-8f5f-574e6fe19463@github.com> On Wed, 9 Feb 2022 10:10:35 GMT, Ivan Walulya wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into nbq-aba >> - dholmes review >> - walulyai review >> - document ABA for push/append > > Minor suggestion Thanks @walulyai and @dholmes-ora for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From kbarrett at openjdk.java.net Thu Feb 10 11:31:44 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 10 Feb 2022 11:31:44 GMT Subject: Integrated: 8280832: Update usage docs for NonblockingQueue In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 22:58:26 GMT, Kim Barrett wrote: > Please review this update of the usage and implementation comments for > NonblockingQueue to discuss the ABA issue in push/append operations. This pull request has now been integrated. Changeset: 3ce1c5b6 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/3ce1c5b6ce02749ef8f9d35409b7bcbf27f47203 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod 8280832: Update usage docs for NonblockingQueue Reviewed-by: iwalulya, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/7393 From jbhateja at openjdk.java.net Thu Feb 10 12:24:14 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 10 Feb 2022 12:24:14 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts In-Reply-To: References: Message-ID: <1EkBcO28e83W0erDN6flFX6eR88aovKxVIGJqOiF40I=.5db87001-570d-4679-9b3a-7937b72233ed@github.com> On Sat, 5 Feb 2022 15:34:08 GMT, Quan Anh Mai wrote: > Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. > > Thank you very much. src/hotspot/cpu/x86/x86.ad line 7288: > 7286: break; > 7287: default: assert(false, "%s", type2name(to_elem_bt)); > 7288: } Please move this into a macro assembly routine. src/hotspot/cpu/x86/x86.ad line 7310: > 7308: default: assert(false, "%s", type2name(to_elem_bt)); > 7309: } > 7310: %} Same as above. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From ayang at openjdk.java.net Thu Feb 10 13:32:42 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 10 Feb 2022 13:32:42 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v2] In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7124/files - new: https://git.openjdk.java.net/jdk/pull/7124/files/16874ed0..8e98e826 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=00-01 Stats: 40 lines in 6 files changed: 5 ins; 25 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/7124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7124/head:pull/7124 PR: https://git.openjdk.java.net/jdk/pull/7124 From ayang at openjdk.java.net Thu Feb 10 13:32:44 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 10 Feb 2022 13:32:44 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 18 Jan 2022 12:03:46 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 I have moved the mutex inside `PSOldGen` to reduce its scope. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From ayang at openjdk.java.net Thu Feb 10 14:58:46 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 10 Feb 2022 14:58:46 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v3] In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: fix release build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7124/files - new: https://git.openjdk.java.net/jdk/pull/7124/files/8e98e826..fa5dcce9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=01-02 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7124/head:pull/7124 PR: https://git.openjdk.java.net/jdk/pull/7124 From volker.simonis at gmail.com Thu Feb 10 15:08:06 2022 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 10 Feb 2022 16:08:06 +0100 Subject: Internal compiler error for slowdebug build with gcc 7.5.0 on Ubuntu 18.04 Message-ID: Hi, When compiling the latest HS sources in slowdebug mode with gcc 7.5.0 (the default compiler on Ubuntu 18.04) I get the following internal compiler error for the file compileBroker.cpp: /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp: In static member function 'static voi d CompileBroker::invoke_compiler_on_method(CompileTask*)': /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp:2393:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90) } ^ Please submit a full bug report, with preprocessed source if appropriate. See for instructions. I know that gcc 7.5.0 isn't officially supported but was just curious if somebody has seen this before? Googling around shows that this issue seems to have been fixed several times in gcc 4.9 and specifically for ppc/rs6000. I've installed and tried gcc 8.4.0 but the error remains the same: GNU C++14 (Ubuntu 8.4.0-1ubuntu1~18.04) version 8.4.0 (x86_64-linux-gnu) compiled by GNU C version 8.4.0, GMP version 6.1.2, MPFR version 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU assembler version 2.30 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.30 Compiler executable checksum: 67fba09f596cc8a67df33f8529603bfb during RTL pass: reload /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp: In static member function ?static void CompileBroker::invoke_compiler_on_method(CompileTask*)?: /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp:2393:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90) } ^ Please submit a full bug report, with preprocessed source if appropriate. See for instructions. According to the "Supported Build Platforms" Wiki [1] it seems that at least SAP is using gcc 8. Have you run into this issue as well? Any ideas how to fix it without upgrading to gcc 10? Thank you and best regards, Volker PS: the release build works perfectly fine with gcc 7.5.0 [1] https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms From duke at openjdk.java.net Thu Feb 10 15:14:44 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Thu, 10 Feb 2022 15:14:44 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - minor rename - address reviews ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7358/files - new: https://git.openjdk.java.net/jdk/pull/7358/files/22a70fe1..8028be52 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7358&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7358&range=00-01 Stats: 81 lines in 4 files changed: 32 ins; 44 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/7358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7358/head:pull/7358 PR: https://git.openjdk.java.net/jdk/pull/7358 From duke at openjdk.java.net Thu Feb 10 15:14:46 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Thu, 10 Feb 2022 15:14:46 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: References: Message-ID: On Wed, 9 Feb 2022 22:52:47 GMT, Sandhya Viswanathan wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - minor rename >> - address reviews > > src/hotspot/cpu/x86/assembler_x86.cpp line 4782: > >> 4780: vector_len == AVX_256bit? VM_Version::supports_avx2() : >> 4781: vector_len == AVX_512bit? VM_Version::supports_evex() : 0, " "); >> 4782: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ true); > > legacy_mode should be false here instead of _legacy_mode_bw. Fixed, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From duke at openjdk.java.net Thu Feb 10 15:14:49 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Thu, 10 Feb 2022 15:14:49 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: <1EkBcO28e83W0erDN6flFX6eR88aovKxVIGJqOiF40I=.5db87001-570d-4679-9b3a-7937b72233ed@github.com> References: <1EkBcO28e83W0erDN6flFX6eR88aovKxVIGJqOiF40I=.5db87001-570d-4679-9b3a-7937b72233ed@github.com> Message-ID: <1U-v8HDdffTAyMecRVwaQhZUi3mmITIGDpuXsbHni5o=.b0bc2c3f-ac7f-4c0d-831c-7586673d5aea@github.com> On Thu, 10 Feb 2022 05:05:05 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - minor rename >> - address reviews > > src/hotspot/cpu/x86/x86.ad line 7288: > >> 7286: break; >> 7287: default: assert(false, "%s", type2name(to_elem_bt)); >> 7288: } > > Please move this into a macro assembly routine. Fixed, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From lkorinth at openjdk.java.net Thu Feb 10 15:47:23 2022 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 10 Feb 2022 15:47:23 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc Message-ID: Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. ------------- Commit messages: - 8281585: Remove unused imports under test/lib and jtreg/gc Changes: https://git.openjdk.java.net/jdk/pull/7426/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7426&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281585 Stats: 92 lines in 60 files changed: 0 ins; 92 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7426.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7426/head:pull/7426 PR: https://git.openjdk.java.net/jdk/pull/7426 From aph at openjdk.java.net Thu Feb 10 16:21:12 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 10 Feb 2022 16:21:12 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> Message-ID: <9wCVZ8gCStf_tUT8_WQjhLzXqqQlQMsijeiBaAXDVVk=.aace6af6-bf1b-40c9-ba19-6fd0ab9b1b0a@github.com> On Tue, 8 Feb 2022 09:40:49 GMT, Andrew Haley wrote: >> Doing this caused 7 failures across a full jtreg run, namely: >> >> serviceability/sa/ClhsdbFindPC.java#xcomp-core >> vmTestbase/jit/misctests/fpustack/GraphApplet.java >> vmTestbase/nsk/jdi/MonitorWaitRequest/MonitorWaitRequest001/TestDescription.java >> vmTestbase/nsk/jdi/MonitorWaitedRequest/MonitorWaitedRequest001/TestDescription.java >> vmTestbase/nsk/jdwp/ThreadReference/ForceEarlyReturn/forceEarlyReturn002/forceEarlyReturn002.java >> vmTestbase/nsk/jdwp/ThreadReference/OwnedMonitorsStackDepthInfo/ownedMonitorsStackDepthInfo002/ownedMonitorsStackDepthInfo002.java >> vmTestbase/nsk/jvmti/RedefineClasses/StressRedefine/TestDescription.java >> >> ....I'll investigate. > >> Doing this caused 7 failures across a full jtreg run, namely: > > I'm glad we caught that. Status? Is branch protection really incompatible with PreserveFramePointer? ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Feb 10 16:39:52 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 10 Feb 2022 16:39:52 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <-nQf8_Gh666U_KH2wCMBEApxI3GFXre1cghHN41KoVg=.c0bc85fd-16ed-49f5-a595-73893facf6df@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <1oSiO-f26IoFOcPDhOOeWrr8x2cH_Wyv4aAjI9gX9-0=.21f677c9-61a4-469e-891c-f35bc469b7e2@github.com> <-V7ptCS4QdcpFHOomMnTPPYvFtKSQ0nswzFNXQDoWLg=.2d72897f-ef45-4867-892f-64df085eca85@github.com> <-nQf8_Gh666U_KH2wCMBEApxI3GFXre1cghHN41KoVg=.c0bc85fd-16ed-49f5-a595-73893facf6df@github.com> Message-ID: On Mon, 7 Feb 2022 15:12:04 GMT, Andrew Haley wrote: >> How about extending the existing enter() function: >> >> // Enter a new stack frame for the current method. >> // nested: Indicates a frame has already been entered (and not left) for the current method. >> void MacroAssembler::enter(bool nested=false) { >> if (nested) strip() >> protect() >> stp() >> mov() >> } >> >> This would add an additional bool check for every call of enter() - that's at code generation time, so probably not an issue. > > So, `nested` is true iff we are, say, pushing an extra frame for a runtime call in the middle of generated code, but for some mysterious reason the logic is inline instead of being implemented in the obvious way as a stub. > > Please do this as: > > ` MacroAssembler::enter(bool strip_return_address=false)` > > and I'll be happy. Please make sure that all calls are commented, as in > > `__ enter(/*strip_return_address*/true);` > > and I'll be happy. Just about to resolve this ... then spotted the "make sure that all calls are commented". Will fix up. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Feb 10 16:39:51 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 10 Feb 2022 16:39:51 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v20] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Merge enter_subframe into enter ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/614a3262..f779513b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=19 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=18-19 Stats: 20 lines in 5 files changed: 5 ins; 9 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Feb 10 16:39:53 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 10 Feb 2022 16:39:53 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: <9wCVZ8gCStf_tUT8_WQjhLzXqqQlQMsijeiBaAXDVVk=.aace6af6-bf1b-40c9-ba19-6fd0ab9b1b0a@github.com> References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> <9wCVZ8gCStf_tUT8_WQjhLzXqqQlQMsijeiBaAXDVVk=.aace6af6-bf1b-40c9-ba19-6fd0ab9b1b0a@github.com> Message-ID: On Thu, 10 Feb 2022 16:18:18 GMT, Andrew Haley wrote: >>> Doing this caused 7 failures across a full jtreg run, namely: >> >> I'm glad we caught that. > > Status? Is branch protection really incompatible with PreserveFramePointer? Eventually found a missing signing in the exception handling. I'm running the full suite now, so should hopefully get something posted tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From coleenp at openjdk.java.net Thu Feb 10 17:14:06 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 10 Feb 2022 17:14:06 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v3] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Thu, 10 Feb 2022 14:58:46 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > fix release build src/hotspot/share/gc/parallel/psOldGen.cpp line 48: > 46: #else > 47: _Expand_lock(Mutex::safepoint, "PSOldGenExpand_lock", true) > 48: #endif As per our coding convention, this should be _expand_lock. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From ayang at openjdk.java.net Thu Feb 10 17:23:42 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 10 Feb 2022 17:23:42 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v4] In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: lower case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7124/files - new: https://git.openjdk.java.net/jdk/pull/7124/files/fa5dcce9..d5a2a9ca Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=02-03 Stats: 11 lines in 2 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/7124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7124/head:pull/7124 PR: https://git.openjdk.java.net/jdk/pull/7124 From ayang at openjdk.java.net Thu Feb 10 17:23:46 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 10 Feb 2022 17:23:46 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v3] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Thu, 10 Feb 2022 17:10:44 GMT, Coleen Phillimore wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix release build > > src/hotspot/share/gc/parallel/psOldGen.cpp line 48: > >> 46: #else >> 47: _Expand_lock(Mutex::safepoint, "PSOldGenExpand_lock", true) >> 48: #endif > > As per our coding convention, this should be _expand_lock. Renamed. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From mdoerr at openjdk.java.net Thu Feb 10 17:46:08 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 10 Feb 2022 17:46:08 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 12:51:31 GMT, Claes Redestad wrote: > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). Hi Claes, it can get implemented similarly on PPC64: https://github.com/openjdk/jdk/pull/7430 You can integrate it if you prefer that, but better after it got a Review. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From psandoz at openjdk.java.net Thu Feb 10 18:31:05 2022 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 10 Feb 2022 18:31:05 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 15:14:44 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - minor rename > - address reviews Running some tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From psandoz at openjdk.java.net Thu Feb 10 18:59:05 2022 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Thu, 10 Feb 2022 18:59:05 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 15:14:44 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - minor rename > - address reviews Observing the following failures on CPUs with "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags: -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation TestVectorCastAVX512.java: Failed IR Rules (1) ------------------ - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])": * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastI2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*) Expected 1 but found 0 nodes. TestVectorCastAVX1.java: - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])": * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastB2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) Expected 1 but found 0 nodes. - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])": * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastB2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) Expected 1 but found 0 nodes. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From mgronlun at openjdk.java.net Thu Feb 10 19:18:13 2022 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 10 Feb 2022 19:18:13 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 06:41:41 GMT, KIRIYAMA Takuya wrote: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? Hi Takuya, thanks for your contribution. src/hotspot/share/jfr/jni/jfrJavaSupport.hpp line 103: > 101: > 102: // critical > 103: static void abort(jstring errorMsg, TRAPS, bool dump_core=true); Not sure this is necessary. The existing core dump logic already handles the case where a core file cannot be generated due to disk full. test/hotspot/jtreg/runtime/jfr/TestJFRDiskFull.java line 127: > 125: raf.close(); > 126: } > 127: } I appreciate the effort, but we can't have a test that intentionally provokes a disk full situation. Instead, the updated error message will have to be manually verified. ------------- Changes requested by mgronlun (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7227 From jamil.j.nimeh at oracle.com Thu Feb 10 21:03:14 2022 From: jamil.j.nimeh at oracle.com (Jamil Nimeh) Date: Thu, 10 Feb 2022 13:03:14 -0800 Subject: Questions re: loading 512-bit constant data through ExternalAddress calls Message-ID: <3b8b37ae-321b-6bcc-1df6-14f308581e72@oracle.com> Hello all, Sorry in advance for the long email, but I thought it might be good to give a little background on what I'm trying to do: I have an intrinsic that I'm working on for the ChaCha20 block function.? I have versions of it to support different processor capabilities, specifically SSE2+AVX, AVX2 and AVX512.? The first two are working great.? The AVX512 is giving me some headaches with a couple specific instructions. I prototyped all of these in C using inline assembly before I got down to playing in hotspot and for the AVX512 implementation, there are a few places where one of the arguments for the EVEX.512 variant of vpaddd would be literal data at a memory location.? I achieved this in assembly like this: // state is backed by uint32[16] and keystream is uint8[256] void cc2Ax512(uint32_t *state, uint8_t *keystream) { ??? asm ( ??????? ".data;" "ctrAddMaskAvx512:;" ??????????? ".long 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0;" ??????? ".text;" ??? ??? ??? // load data into zmm0/1/2/3 - that all works fine ??? ??????? "vpaddd %%zmm3, %%zmm3, [ctrAddMaskAvx512];" ??????????? // complete the rest of the function ?? ???? : // No output registers ??????? : "m"(state), "m"(keystream) ??????? : "rbx", "rdx", "ecx" ??? ); } I didn't put the whole routine in there for brevity, but the data loads from that ctrAddMaskAvx512 address and adds properly to the values already in zmm3 at the time of the vpaddd. When it came time to try the equivalent approach in hotspot, I looked around for anything else that might be doing this and I found some examples of constant values being created in functions and passed into what look like ExternalAddress calls (or are they constructors?? I haven't gone looking in-depth yet on that one). I used the ghash_shufflemask_addr() as a template for what I thought I was supposed to do: * in stubGenerator_x86_64.cpp I created a function chacha20_ctradd_avx512() that is basically 8 emit_data64() calls with the 512 bits of data I wanted to reference.? The ghash function I was using as a reference only writes 128-bits, but otherwise my function is put together the same way. * further down in stubGenerator_x86_64.cpp I also assign this function to a StubRoutines::x86 field, "StubRoutines::x86::_chacha20_counter_addmask_avx512 = chacha20_ctradd_avx512();" * in stubRoutines_x86.cpp/hpp I define "_chacha20_counter_addmask_avx512" and create a method chacha20_counter_addmask_avx512() that simply returns _chacha20_counter_addmask_avx512.? All of this follows the ghash_shufflemask_addr approach. * Finally when I wish to use it, say for a vpaddd call, it would look something like this: o __ vpaddd(zmm_dVec, zmm_dVec, ExternalAddress(StubRoutines::x86::chacha20_counter_addmask_avx512()), Assembler::AVX_512bit, rax); o By comparison, the ghash approach I was using as a template is used with movdqu, so it's just a 128-bit move, but the source is an ExternalAddress similar to what I'm doing so I thought the technique would work more or less for EVEX variants that can have memory source addresses. This all compiles and for some weird reason I saw it actually work correctly one time.? But most of the time the output after the add call is completely unrecognizable, as if it's adding data from some oddball address.? If I comment out that vpaddd statement, the data in the register is exactly what I would expect it to be before the add takes place.? So I'm fairly confident that the statements before that particular add are correct. Here's where it gets weird.? I have a similar method for my AVX2 version of the intrinsic.? In that case, it's only doing 4 emit_data64 calls, and it passes it the same way into a vpaddd, but of course the vector_len is Assembler::AVX_256bit.? It works perfectly every time.? I don't have a good sense of why it's always working there but not with my 512-bit counterpart. I could definitely use some hotspot insights.? This approach in general was my best guess at loading/using 512-bit literals as source arguments but I'm definitely open to alternatives.? I have also tried the built-in generate_vector_custom_i32() function since that would allow me to do away with my own custom functions, so long as it doesn't hurt from a performance standpoint.? It seems to fail in the same way that my own functions do. I am fairly new to assembly and these intrinsics so if you have suggestions/comments bear in mind that I don't eat/sleep/breathe hotspot like I would imagine some of the folks on this alias do. :)? But at least from a functional perspective, I know once I can get these literal 512-bit values working the rest of the intrinsic function should fall into place because my C/assembly prototype works like a champ for all vector length variants. Definitely open to your insights/comments, Thanks, --Jamil From eastig at amazon.co.uk Thu Feb 10 23:02:08 2022 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Thu, 10 Feb 2022 23:02:08 +0000 Subject: RFC: AArch64: Set Segmented CodeCache default size to 127M Message-ID: <64AB1C1E-4151-4979-BF15-CC71D00E98DB@amazon.com> Hello, We?d like to discuss a proposal for setting TieredCompilation Segmented CodeCache default size to 127M on AArch64 (https://bugs.openjdk.java.net/browse/JDK-8280150). The current default size of TieredCompilation CodeCache is 240M: 116M "non-profiled" segment + 116M "profiled" segment + 8M "non-nmethods" segment. AArch64 ISA has direct calls and jumps range limited to 128M. The C1/C2 compilers generate far jumps, calls and trampolines to overcome the limitation of direct jumps/calls. They use MacroAssembler::far_branches which compares ReservedCodeCacheSize with the direct jumps/calls range. With 240M CodeCache JIT has to use far jumps/trampolines. Such far jumps/trampolines result in performance and code size overhead. Our observations [1] suggest most applications running on AArch64 platforms have hot code not exceeding 128M. AArch64 has a default ReservedCodeCacheSize of 48M. For tiered compilation the value is multiplied by 5 getting it to 240M. We experimented with CodeCache configuration: 48M "non-profiled" segment + 48M "profiled" segment + 8M "non-nmethods" segment. We ran SpecJbb2015, DaCapo at f480064 (https://github.com/dacapobench/dacapobench/tree/dev-chopin), Renaissance 0.14, and internal services. We did not see any statistically significant regressions. SpecJbb improved max-jOPS by +1.68% and critical-jOPS by +1.34%. For DaCapo, eclipse improved by 3.57%, tomcat by 1.45% and tradesoap by 3.03%. Only two Renaissance benchmarks had statistically significant results: dotty (+9.0%) and finagle-http (+3.9%). Others had changes which were comparable with the coefficient of variation. All benchmarks had significant decreases in max use of the non-profiled and profiled segments (see data below). To mitigate risks of 104M not being enough we?d like to change the default size of TieredCompilation CodeCache to 127M (which is just below the size where the JIT would generate far jumps and trampolines): 60M "non-profiled" segment + 60M "profiled" segment + 7M "non-nmethods" segment. We did partial runs with 127M CodeCache. Their results were similar to the 104M configuration. Average maximum used memory(Kb) in segments (it was checked numbers of compiled methods were similar in both cases): NPS=non-profiled segment PS=profiled segment NNS=non-nmethods segment SpecJbb +----------+---------+--------+---------+--------+--------+----------+---------+----------+ | 116M NPS | 116M PS | 8M NNS | 48M NPS | 48M PS | 8M NNS | diff NPS | diff PS | diff NNS | +----------+---------+--------+---------+--------+--------+----------+---------+----------+ | 12491 | 13968 | 4274 | 10649 | 12276 | 4234 | -14.7% | -12.1% | -0.9% | +----------+---------+--------+---------+--------+--------+----------+---------+----------+ DaCapo +------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+ | benchmark | 116M NPS | 116M PS | 8M NNS | 48M NPS | 48M PS | 8M NNS | diff NPS | diff PS | diff NNS | +------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+ | avrora | 2301 | 6324 | 4167 | 1887 | 5049 | 4080 | -18.00% | -20.20% | -2.10% | | batik | 6108 | 5301 | 4128 | 4686 | 4289 | 4114 | -23.30% | -19.10% | -0.30% | | biojava | 2018 | 5907 | 4047 | 1703 | 5364 | 4026 | -15.60% | -9.20% | -0.50% | | eclipse | 30862 | 26824 | 4275 | 27314 | 24330 | 4180 | -11.50% | -9.30% | -2.20% | | jme | 1567 | 5987 | 3502 | 1315 | 5205 | 3491 | -16.10% | -13.10% | -0.30% | | lusearch | 5424 | 9145 | 4201 | 4699 | 7147 | 4100 | -13.40% | -21.90% | -2.40% | | pmd | 12011 | 14438 | 4232 | 10701 | 12456 | 4140 | -10.90% | -13.70% | -2.20% | | sunflow | 1707 | 4341 | 4082 | 1220 | 3174 | 4040 | -28.60% | -26.90% | -1.00% | | tomcat | 15228 | 23595 | 4292 | 13519 | 20686 | 4187 | -11.20% | -12.30% | -2.50% | | graphchi | 1243 | 5238 | 4009 | 1063 | 4375 | 3998 | -14.50% | -16.50% | -0.30% | | xalan | 5270 | 8363 | 4191 | 4784 | 6643 | 4100 | -9.20% | -20.60% | -2.20% | | fop | 11597 | 20814 | 4336 | 10361 | 18485 | 4256 | -10.70% | -11.20% | -1.80% | | luindex | 4013 | 5531 | 3697 | 3083 | 4384 | 3507 | -23.20% | -20.70% | -5.20% | | zxing | 4577 | 7267 | 4255 | 4044 | 5820 | 4164 | -11.60% | -19.90% | -2.10% | | tradebeans | 10313 | 26983 | 4603 | 9210 | 24954 | 4522 | -10.70% | -7.50% | -1.80% | | tradesoap | 16939 | 35276 | 4649 | 15245 | 30888 | 4549 | -10.00% | -12.40% | -2.10% | +------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+ Renaissance +------------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+ | benchmark | 116M NPS | 116M PS | 8M NNS | 48M NPS | 48M PS | 8M NNS | diff NPS | diff PS | diff NNS | +------------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+ | akka-uct | 4053 | 9615 | 3661 | 3001 | 8381 | 3559 | -26.00% | -12.80% | -2.80% | | als | 20732 | 39367 | 4554 | 18914 | 32400 | 4464 | -8.80% | -17.70% | -2.00% | | chi-square | 7922 | 23568 | 3828 | 7160 | 20603 | 3759 | -9.60% | -12.60% | -1.80% | | dec-tree | 23938 | 55512 | 4026 | 21857 | 36866 | 3946 | -8.70% | -33.60% | -2.00% | | dotty | 42405 | 40963 | 3712 | 37997 | 32770 | 3621 | -10.40% | -20.00% | -2.50% | | finagle-chirper | 21150 | 19833 | 3795 | 18652 | 17479 | 3693 | -11.80% | -11.90% | -2.70% | | finagle-http | 11950 | 19553 | 3778 | 10675 | 17234 | 3709 | -10.70% | -11.90% | -1.80% | | fj-kmeans | 960 | 4756 | 3504 | 882 | 4437 | 3484 | -8.10% | -6.70% | -0.60% | | future-genetic | 1760 | 5470 | 3526 | 1466 | 4449 | 3497 | -16.70% | -18.70% | -0.80% | | gauss-mix | 11910 | 21406 | 4459 | 10675 | 18741 | 4382 | -10.40% | -12.40% | -1.70% | | log-regression | 25230 | 42802 | 4108 | 22791 | 34542 | 3989 | -9.70% | -19.30% | -2.90% | | mnemonics | 1094 | 3914 | 3501 | 1010 | 3669 | 3480 | -7.70% | -6.30% | -0.60% | | movie-lens | 20571 | 23472 | 4495 | 18500 | 20728 | 4424 | -10.10% | -11.70% | -1.60% | | naive-bayes | 24305 | 45967 | 4030 | 22124 | 35135 | 3929 | -9.00% | -23.60% | -2.50% | | page-rank | 9386 | 24226 | 3817 | 8554 | 22081 | 3769 | -8.90% | -8.90% | -1.30% | | par-mnemonics | 1217 | 4318 | 3501 | 1128 | 4098 | 3477 | -7.30% | -5.10% | -0.70% | | philosophers | 2647 | 5765 | 3571 | 2146 | 4293 | 3506 | -18.90% | -25.50% | -1.80% | | reactors | 2663 | 5266 | 3632 | 2278 | 4321 | 3513 | -14.50% | -17.90% | -3.30% | | rx-scrabble | 2511 | 6721 | 3535 | 2131 | 5037 | 3506 | -15.10% | -25.10% | -0.80% | | scala-doku | 2106 | 6408 | 3522 | 1775 | 4744 | 3500 | -15.70% | -26.00% | -0.60% | | scala-kmeans | 1104 | 4634 | 3497 | 1002 | 4345 | 3481 | -9.20% | -6.20% | -0.50% | | scala-stm-bench7 | 3492 | 6611 | 3601 | 3158 | 5302 | 3509 | -9.60% | -19.80% | -2.60% | | scrabble | 1816 | 6046 | 3546 | 1460 | 4902 | 3496 | -19.60% | -18.90% | -1.40% | +------------------+----------+---------+--------+---------+--------+--------+----------+---------+----------+ [1] CodeCache usage data from: - Latest versions of SpecJbb, DaCapo and Renaissance benchmarks. - An internal service with 15000+ compiled Java methods running without compilation issues with 64M CodeCache (TieredCompilation off) and with 127M segmented CodeCache. - A recommendation to use 64M CodeCache (TieredCompilation off) to improve performance (https://github.com/aws/aws-graviton-getting-started/blob/main/java.md). - IDEs like IntelliJ, CLion can use more 130M but they don't rely on the default values. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From dholmes at openjdk.java.net Thu Feb 10 23:24:11 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 23:24:11 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 15:39:53 GMT, Leo Korinth wrote: > Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. Looks fine. The proof of these changes is in compiling the files - how did you test the non-gc-test changes? Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7426 From dholmes at openjdk.java.net Thu Feb 10 23:33:09 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 10 Feb 2022 23:33:09 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 15:39:53 GMT, Leo Korinth wrote: > Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. Forgot to mention copyright years need updating before integrating! Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7426 From yyang at openjdk.java.net Fri Feb 11 03:35:10 2022 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 11 Feb 2022 03:35:10 GMT Subject: RFR: 8275775: Add jcmd VM.classes to print details of all classes [v2] In-Reply-To: References: <_Pw-D6A2BD-4wx0mZ5lFvFlxBRylbA5WT9y5xgtDBvk=.fa85a604-e6a0-463b-8a4a-4ae7e210661a@github.com> Message-ID: On Tue, 18 Jan 2022 02:59:11 GMT, David Holmes wrote: >> Yi Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8275775 Add VM.classes to print details of all classes > > src/hotspot/share/oops/instanceKlass.cpp line 2069: > >> 2067: ResourceMark rm; >> 2068: _st->print("%-18s", "KlassAddr"); >> 2069: _st->print(" "); > > Can't you just print the two spaces in the previous line: > > _st->print("%-18s ", "KlassAddr"); > > and save all the additional print calls. This applies throughout where you have " ". @dholmes-ora David, Can you please take a look at the latest version? I've addressed all problems you pointed out. ------------- PR: https://git.openjdk.java.net/jdk/pull/7105 From kvn at openjdk.java.net Fri Feb 11 04:17:10 2022 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 11 Feb 2022 04:17:10 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v8] In-Reply-To: <6_ddanyI-FFaerYCGBHYYGlJQZpUvypaIIoPOq6S3wM=.b77c72f2-e29a-4d31-826c-f42c737978d1@github.com> References: <6_ddanyI-FFaerYCGBHYYGlJQZpUvypaIIoPOq6S3wM=.b77c72f2-e29a-4d31-826c-f42c737978d1@github.com> Message-ID: On Thu, 10 Feb 2022 08:46:50 GMT, Emanuel Peter wrote: >> Deprecated ExtendedDTraceProbes. >> Edited help messages and man pages accordingly, added the 3 flags to man pages. >> Added flag to VMDeprecatedOptions test. >> Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. >> >> Checked that tests are not affected. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fixes to documentation requested by reviewers Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7110 From kvn at openjdk.java.net Fri Feb 11 05:07:09 2022 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 11 Feb 2022 05:07:09 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: <4k5B_eeCIPWe4rTYueR7n0lixNRMFzItoV9U7lCfIbM=.ada0c192-7ca6-4d4b-bdcb-a912e7867aa5@github.com> References: <5tnuK3pwhbOWk8dJlEkELJoxEFhmDyZFwpG5DfkozQ4=.b3cad787-bb0d-4716-91ed-079669da8eb0@github.com> <4k5B_eeCIPWe4rTYueR7n0lixNRMFzItoV9U7lCfIbM=.ada0c192-7ca6-4d4b-bdcb-a912e7867aa5@github.com> Message-ID: <8jJY1_Q3nc_G2FTOG8A76nIXXWk1nE65wNN-8czawEI=.378b4e8b-cc86-4a0d-8506-8211f4d1bcc6@github.com> On Wed, 9 Feb 2022 18:55:06 GMT, Harold Seigel wrote: >> Dunno, maybe? I see the lot of other "small" options are `intx`, and the change like that would proliferate to all architectures that set `OptoLoopAlignment` as their `product_pd`. It also raises the question if `CodeEntryAlignment` should also be `int`? I'd rather keep this patch small, to be honest. > > Your comment makes sense. Thanks. Yes, flags types clean up should be separate issue. ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From kvn at openjdk.java.net Fri Feb 11 05:07:09 2022 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 11 Feb 2022 05:07:09 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: <8eS0mvN89K2bpyqyXyhKajH5mWjuzuCRQP7zlDK3E5g=.0e48304f-c9f6-45c7-bfe1-44812dd4a57f@github.com> On Tue, 8 Feb 2022 18:19:00 GMT, Aleksey Shipilev wrote: > I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. > > Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. > > The default values for options are different per platform, so tests are x86_64 specific. > > No default value is changed, this only unblocks experiments. > > Additional testing: > - [x] New tests on Linux x86_64 fastdebug > - [x] New tests on Linux x86_64 release I agree with changes. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7388 From kbarrett at openjdk.java.net Fri Feb 11 05:43:45 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 11 Feb 2022 05:43:45 GMT Subject: RFR: 8281626: NonblockingQueue should use nullptr Message-ID: Please review this change to use nullptr instead of NULL throughout the NonblockingQueue class. Testing: mach5 tier1 ------------- Commit messages: - use nullptr throughout Changes: https://git.openjdk.java.net/jdk/pull/7438/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7438&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281626 Stats: 39 lines in 2 files changed: 0 ins; 0 del; 39 mod Patch: https://git.openjdk.java.net/jdk/pull/7438.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7438/head:pull/7438 PR: https://git.openjdk.java.net/jdk/pull/7438 From shade at openjdk.java.net Fri Feb 11 06:42:09 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 06:42:09 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: <8eS0mvN89K2bpyqyXyhKajH5mWjuzuCRQP7zlDK3E5g=.0e48304f-c9f6-45c7-bfe1-44812dd4a57f@github.com> References: <8eS0mvN89K2bpyqyXyhKajH5mWjuzuCRQP7zlDK3E5g=.0e48304f-c9f6-45c7-bfe1-44812dd4a57f@github.com> Message-ID: On Fri, 11 Feb 2022 05:03:54 GMT, Vladimir Kozlov wrote: > I agree with changes. Thank you, I'll wait for another reviewer and then integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From shade at openjdk.java.net Fri Feb 11 06:46:07 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 06:46:07 GMT Subject: RFR: 8281626: NonblockingQueue should use nullptr In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 05:37:27 GMT, Kim Barrett wrote: > Please review this change to use nullptr instead of NULL throughout the NonblockingQueue class. > > Testing: > mach5 tier1 Looks fine! ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7438 From dholmes at openjdk.java.net Fri Feb 11 07:01:10 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 11 Feb 2022 07:01:10 GMT Subject: RFR: 8275775: Add jcmd VM.classes to print details of all classes [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jan 2022 09:17:09 GMT, Yi Yang wrote: >> Add VM.classes to print details of all classes, output looks like: >> >> 1. jcmd VM.classes >> >> KlassAddr Size State Flags LoaderName ClassName >> 0x0000000800c0b400 62 inited W bootstrap java.lang.invoke.LambdaForm$MH/0x0000000800c0b400 >> 0x0000000800c0b000 62 inited W bootstrap java.lang.invoke.LambdaForm$DMH/0x0000000800c0b000 >> 0x0000000800c0ac00 62 inited W bootstrap java.lang.invoke.LambdaForm$MH/0x0000000800c0ac00 >> ... >> >> 2. jcmd VM.classes verbose >> >> KlassAddr Size State Flags LoaderName ClassName >> 0x0000000800c0b400 62 inited W bootstrap java.lang.invoke.LambdaForm$MH/0x0000000800c0b400 >> java.lang.invoke.LambdaForm$MH/0x0000000800c0b400 {0x0000000800c0b400} >> - instance size: 2 >> - klass size: 62 >> - access: final synchronized >> - state: inited >> - name: 'java/lang/invoke/LambdaForm$MH+0x0000000800c0b400' >> - super: 'java/lang/Object' >> - sub: >> - arrays: NULL >> - methods: Array(0x00007f620841f210) >> - method ordering: Array(0x0000000800a7e5a8) >> - default_methods: Array(0x0000000000000000) >> - local interfaces: Array(0x00000008005af748) >> - trans. interfaces: Array(0x00000008005af748) >> - constants: constant pool [41] {0x00007f620841f030} for 'java/lang/invoke/LambdaForm$MH+0x0000000800c0b400' cache=0x00007f620841f380 >> - class loader data: loader data: 0x00007f61c804a690 of 'bootstrap' has a class holder >> - source file: 'LambdaForm$MH' >> - class annotations: Array(0x0000000000000000) >> - class type annotations: Array(0x0000000000000000) >> - field annotations: Array(0x0000000000000000) >> - field type annotations: Array(0x0000000000000000) >> - inner classes: Array(0x00000008005af6d8) >> - nest members: Array(0x00000008005af6d8) >> - permitted subclasses: Array(0x00000008005af6d8) >> - java mirror: a 'java/lang/Class'{0x000000011f4b3968} = 'java/lang/invoke/LambdaForm$MH+0x0000000800c0b400' >> - vtable length 5 (start addr: 0x0000000800c0b5b8) >> - itable length 2 (start addr: 0x0000000800c0b5e0) >> - ---- static fields (1 words): >> - static final '_D_0' 'Ljava/lang/invoke/LambdaForm;' @112 >> - ---- non-static fields (0 words): >> - non-static oop maps: >> 0x0000000800c0b000 62 inited W bootstrap java.lang.invoke.LambdaForm$DMH/0x0000000800c0b000 >> java.lang.invoke.LambdaForm$DMH/0x0000000800c0b000 {0x0000000800c0b000} >> - instance size: 2 >> - klass size: 62 >> - access: final synchronized >> - state: inited >> - name: 'java/lang/invoke/LambdaForm$DMH+0x0000000800c0b000' >> - super: 'java/lang/Object' >> - sub: >> - arrays: NULL >> - methods: Array(0x00007f620841ea68) >> - method ordering: Array(0x0000000800a7e5a8) >> - default_methods: Array(0x0000000000000000) >> - local interfaces: Array(0x00000008005af748) >> - trans. interfaces: Array(0x00000008005af748) >> - constants: constant pool [49] {0x00007f620841e838} for 'java/lang/invoke/LambdaForm$DMH+0x0000000800c0b000' cache=0x00007f620841ebe0 >> - class loader data: loader data: 0x00007f61c804a750 of 'bootstrap' has a class holder >> - source file: 'LambdaForm$DMH' >> - class annotations: Array(0x0000000000000000) >> - class type annotations: Array(0x0000000000000000) >> - field annotations: Array(0x0000000000000000) >> - field type annotations: Array(0x0000000000000000) >> - inner classes: Array(0x00000008005af6d8) >> - nest members: Array(0x00000008005af6d8) >> - permitted subclasses: Array(0x00000008005af6d8) >> - java mirror: a 'java/lang/Class'{0x000000011f4b0968} = 'java/lang/invoke/LambdaForm$DMH+0x0000000800c0b000' >> - vtable length 5 (start addr: 0x0000000800c0b1b8) >> - itable length 2 (start addr: 0x0000000800c0b1e0) >> - ---- static fields (1 words): >> - static final '_D_0' 'Ljava/lang/invoke/LambdaForm;' @112 >> - ---- non-static fields (0 words): >> ... > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > fix Hi Yi, I had been expecting to see further updates as not all issues seem resolved. I have a few further typos and nits below. But I'd like to see someone from serviceability actually approve this. Thanks, David src/hotspot/share/services/diagnosticCommand.cpp line 962: > 960: DCmdWithParser(output, heap), > 961: _verbose("-verbose", > 962: "Dump the detail content of Java class. " s/detail/detailed/ s/of/of a/ src/hotspot/share/services/diagnosticCommand.cpp line 964: > 962: "Dump the detail content of Java class. " > 963: "Some classes are annotated with flags: " > 964: "F = has finializer method, " typo finializer - but should be finalize Is this actually only present for "non-trivial finalize" method? src/hotspot/share/services/diagnosticCommand.cpp line 966: > 964: "F = has finializer method, " > 965: "f = has final method, " > 966: "V = has vanilla constructor, " What is a vanilla constructor? There is no such term in JLS. src/hotspot/share/services/diagnosticCommand.cpp line 968: > 966: "V = has vanilla constructor, " > 967: "W = methods rewritten, " > 968: "C = marked with contended annotation, " @contended ------------- PR: https://git.openjdk.java.net/jdk/pull/7105 From dholmes at openjdk.java.net Fri Feb 11 07:01:11 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 11 Feb 2022 07:01:11 GMT Subject: RFR: 8275775: Add jcmd VM.classes to print details of all classes [v6] In-Reply-To: References: Message-ID: On Thu, 27 Jan 2022 16:00:54 GMT, Ioi Lam wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix > > src/hotspot/share/oops/instanceKlass.cpp line 2081: > >> 2079: _st->print(INTPTR_FORMAT " ", p2i(k)); >> 2080: // klass size >> 2081: _st->print("%-4d ", k->size()); > > Should be `%4d` so that the numbers are aligned correctly. This issue seem still outstanding. ------------- PR: https://git.openjdk.java.net/jdk/pull/7105 From dholmes at openjdk.java.net Fri Feb 11 07:10:09 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 11 Feb 2022 07:10:09 GMT Subject: RFR: 8281626: NonblockingQueue should use nullptr In-Reply-To: References: Message-ID: <4ErT0Fu5rDCnLIq-LfhqnvVMV6uALY8wLUisGI7gN-E=.e92008cf-4b9b-443b-b994-92a3f4dfb03c@github.com> On Fri, 11 Feb 2022 05:37:27 GMT, Kim Barrett wrote: > Please review this change to use nullptr instead of NULL throughout the NonblockingQueue class. > > Testing: > mach5 tier1 Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7438 From duke at openjdk.java.net Fri Feb 11 08:02:11 2022 From: duke at openjdk.java.net (duke) Date: Fri, 11 Feb 2022 08:02:11 GMT Subject: Withdrawn: 8276618: Pad cacheline for Thread::_rcu_counter In-Reply-To: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> References: <6kHhrYgTQ2_ST7TG7H0Syf6_QR8OW4qTc1KGIRJMhWE=.e29aee68-ca4e-46b0-a930-fc38e5176ca9@github.com> Message-ID: On Thu, 4 Nov 2021 05:09:48 GMT, Hamlin Li wrote: > Currently, Thread::_rcu_counter is not padded by cacheline, it should be beneficail to do so. > > The initial spebjbb test shows about 10.5% improvement of critical, and 0.7% improvement of max in specjbb2015. > > > > ========= test result (1st round) ========== > rcu base > 45096 38980 > 41741 41468 > 42349 41053 > 44485 42030 > 47103 39915 > 43864 36004 > > ==== average ==== > 44106.33333 39908.33333 > > ==== improvement ==== > 10.5% > > ========= test result (2nd round) ========== > Second round of run includes 3 types: > 1. pad gc data & pad rcu > 2. pad rcu only > 3. base > > Although the improvement is not that much as the previous round (10%), but still got about 3~4% improvement. > > gc data & rcu rcu base > 41284 41860 37099 > 42296 42166 44692 > 42810 43423 41801 > 43492 45603 40274 > 43808 40641 39627 > 43029 40242 39793 > 42543 41662 41544 > 43420 42702 37991 > 44212 43354 40319 > 42692 43442 45264 > 44773 44577 44213 > 40835 41870 42008 > 44282 44167 42527 > > ==== average ==== > 43036.61538 42746.84615 41319.38462 > > ==== improvement ==== > gc data + rcu / base: 4.156% > rcu / base: 3.45% > > > > > ========= configuration and environment ========== > specjbb arguments: > GROUP_COUNT=4 > TI_JVM_COUNT=1 > > SPEC_OPTS_C="-Dspecjbb.group.count=$GROUP_COUNT -Dspecjbb.txi.pergroup.count=$TI_JVM_COUNT" > SPEC_OPTS_TI="" > SPEC_OPTS_BE="" > > JAVA_OPTS_C="-server -Xms2g -Xmx2g -XX:+UseParallelGC" > JAVA_OPTS_TI="-server -Xms2g -Xmx2g -XX:+UseParallelGC" > JAVA_OPTS_BE="-server -XX:+UseG1GC -Xms32g -Xmx32g" > > MODE_ARGS_C="-ikv" > MODE_ARGS_TI="-ikv" > MODE_ARGS_BE="-ikv" > > NUM_OF_RUNS=1 > > HW: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 224 > On-line CPU(s) list: 0-223 > Thread(s) per core: 2 > Core(s) per socket: 28 > Socket(s): 4 > NUMA node(s): 4 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz > Stepping: 4 > CPU MHz: 1001.925 > CPU max MHz: 2101.0000 > CPU min MHz: 1000.0000 > BogoMIPS: 4200.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 39424K > NUMA node0 CPU(s): 0-27,112-139 > NUMA node1 CPU(s): 28-55,140-167 > NUMA node2 CPU(s): 56-83,168-195 > NUMA node3 CPU(s): 84-111,196-223 > > total used free shared buff/cache available > Mem: 3.0T 3.8G 2.9T 18M 25G 2.9T > Swap: 99G 0B 99G This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6246 From vladimir.x.ivanov at oracle.com Thu Feb 10 19:29:45 2022 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Feb 2022 22:29:45 +0300 Subject: RFC : Approach to handle Allocation Merges in C2 Scalar Replacement In-Reply-To: References: Message-ID: <919984fa-fae7-944a-ca5d-eebd3b516f52@oracle.com> (BCCing hotspot-dev and moving the discussion to hotspot-compiler-dev.) Hi Cesar, Thanks for looking into enhancing EA. Overall, the proposal looks reasonable. I suggest to look more closely at split_unique_types(). It introduces a dedicated class of alias categories for fields of the allocation being eliminated and clones memory graph. I don't see why it shouldn't work for multiple allocations. Moreover, split_unique_types() will break if you start optimizing multiple allocations at once. The notion of unique alias should be adjusted and cover the union of unique aliases for all interacting allocations. Seems like you need to enhance SR to work on non-intersecting clusters of allocations. One thing to take care of: scalar replacement relies on TypeOopPtr::instance_id(). // If not InstanceTop or InstanceBot, indicates that this is // a particular instance of this type which is distinct. // This is the node index of the allocation node creating this instance. int _instance_id; It'll break when multiple allocations are in play. Best regards, Vladimir Ivanov On 09.02.2022 04:45, Cesar Soares Lucas wrote: > Hi there again! > > Can you please give me feedback on the following approach to at least partially > address [1], the scalar replacement allocation merge issue? > > The problem that I am trying to solve arises when allocations are merged after a > control flow split. The code below shows _one example_ of such a situation. > > public int ex1(boolean cond, int x, int y) { > ? ? Point p = new Point(x, y); > ? ? if (cond) > ? ? ? ? p = new Point(y, x); > ? ? // Allocations for p are merged here. > ? ? return p.calc(); > } > > Assuming the method calls on "p" are inlined then the allocations will not > escape the method. The C2 IR for this method will look like this: > > public int ex1(boolean cond, int first, int second) { > ? ? p0 = Allocate(...); > ? ? ... > ? ? p0.x = first; > ? ? p0.y = second; > > ? ? if (cond) { > ? ? ? ? p1 = Allocate(...); > ? ? ? ? ... > ? ? ? ? p1.x = second; > ? ? ? ? p1.y = first; > ? ? } > > ? ? p = phi(p0, p1) > > ? ? return p.x - p.y; > } > > However, one of the constraints implemented here [2], specifically the third > one, will prevent the objects from being scalar replaced. > > The approach that I'm considering for solving the problem is to replace the Phi > node `p = phi(p0, p1)` with new Phi nodes for each of the fields of the objects > in the original Phi. The IR for `ex1` would look something like this after the > transformation: > > public int ex1(boolean cond, int first, int second) { > ? ? p0 = Allocate(...); > ? ? ... > ? ? p0.x = first; > ? ? p0.y = second; > > ? ? if (cond) { > ? ? ? ? p1 = Allocate(...); > ? ? ? ? ... > ? ? ? ? p1.x = second; > ? ? ? ? p1.y = first; > ? ? } > > ? ? pX = phi(first, second) > ? ? pY = phi(second, first) > > ? ? return pX - pY; > } > > I understand that this transformation might not be applicable for all cases and > that it's not as simple as illustrated above. Also, it seems to me that much of > what I'd have to implement is already implemented in other steps of the Scalar > Replacement pipeline (which is a good thing). To work around these > implementation details I plan to use as much of the existing code as possible. > The algorithm for the transformation would be like this: > > split_phis(phi) > ? ? # If output of phi escapes, or something uses its identity, etc > ? ? # then we can't remove it. The conditions here might possible be the > ? ? # same as the ones implemented in `PhaseMacroExpand::can_eliminate_allocation` > ? ? if cant_remove_phi_output(phi) > ? ? ? ? return ; > > ? ? # Collect a set of tuples(F,U) containing nodes U that uses field F > ? ? # member of the object resulting from `phi`. > ? ? fields_used = collect_fields_used_after_phi(phi) > > ? ? foreach field in fields_used > ? ? ? ? producers = {} > > ? ? ? ? # Create a list with the last Store for each field "field" on the > ? ? ? ? # scope of each of the Phi input objects. > ? ? ? ? foreach o in phi.inputs > ? ? ? ? ? ? # The function called below might re-use a lot of the code/logic in `PhaseMacroExpand::scalar_replacement` > ? ? ? ? ? ? producers += last_store_to_o_field(0, field) > > ? ? ? ? # Create a new phi node whose inputs are the Store's to 'field' > ? ? ? ? field_phi = create_new_phi(producers) > > ? ? ? ? update_consumers(field, field_phi) > > The implementation that I envisioned would be as a "pre-process" [3] step just > after EA but before the constraint checks in `adjust_scalar_replaceable_state` > [2]. If we agree that the overall Scalar Replacement implementation goes through > the following major phases: > > ? ? 1. Identify the Escape Status of objects. > ? ? 2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints. > ? ? 3. Make call to Split_unique_types [4]. > ? ? 4 Iterate over object and array allocations. > ? ? ? ? 4.1 Check if allocation can be eliminated. > ? ? ? ? 4.2 Perform scalar replacement. Replace uses of object in Safepoints. > ? ? ? ? 4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X. > > The transformation that I am proposing would change the overall flow to look > like this: > > ? ? 1. Identify the Escape Status of objects. > ? ? 2. ----> New: "Split phi functions" <---- > ? ? 2. Adjust object Escape and/or Scalar Replacement status based on a set of constraints. > ? ? 3. Make call to Split_unique_types [14]. > ? ? 4 Iterate over object and array allocations. > ? ? ? ? 4.1 ----> Moved to split_phi: "Check if allocation can be eliminated" <---- > ? ? ? ? 4.2 Perform scalar replacement. Replace uses of object in Safepoints. > ? ? ? ? 4.3 Process users of CheckCastPP other than Safepoint: AddP, ArrayCopy and CastP2X. > > Please let me know what you think and thank you for taking the time to review > this! > > > Regards, > Cesar > > Notes: > > ? ? [1] I am not sure yet how this approach will play with the case of a merge > ? ? ? ? with NULL. > > ? ? [2] https://github.com/openjdk/jdk/blob/2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff/src/hotspot/share/opto/escape.cpp#L1809 > > ? ? [3] Another option would be to "patch" the current implementation to be able > ? ? ? ? to handle the merges. I am not certain that the "patch" approach would be > ? ? ? ? better, however, the "pre-process" approach is certainly much easier to test > ? ? ? ? and more readable. > > ? ? [4] I cannot say I understand 100% the effects of executing > ? ? ? ? split_unique_types(). Would the transformation that I am proposing need to > ? ? ? ? be after the call to split_unique_types? From shade at openjdk.java.net Fri Feb 11 08:50:11 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 08:50:11 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 17:24:41 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_64 fastdebug, `tier4` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Show watermark in better place on the chart I am sure nothing bad is going to happen if I integrate this on Friday! ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Fri Feb 11 08:50:11 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 08:50:11 GMT Subject: Integrated: 8072070: Improve interpreter stack banging In-Reply-To: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Thu, 27 Jan 2022 18:42:15 GMT, Aleksey Shipilev wrote: > This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. > > The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. > > This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. > > I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. > > I think it is fairly complete, and so would like to solicit more feedback and testing here. > > Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: > > > compiler.compiler: +77% > compiler.sunflow: +69% > compress: +166% > crypto.rsa: +15% > crypto.signverify: +70% > mpegaudio: +8% > serial: +50% > sunflow: +57% > xml.transform: +61% > xml.validation: +43% > > > My new `java.lang.invoke` benchmarks improve a lot as well: > > > Benchmark Mode Cnt Score Error Units > > # Mainline > MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op > MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op > VHGet.plain avgt 5 231.372 ? 3.044 ns/op > VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op > > # This WIP > MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op > MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op > VHGet.plain avgt 5 52.506 ? 3.768 ns/op > VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op > > > It also palpably improves startup even on small HelloWorld, _even when compilers are present_: > > > $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) > 96 context-switches # 4.353 K/sec ( +- 0.07% ) > 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) > 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) > 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) > 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) > 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) > 67,296,528 instructions # 0.85 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) > 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) > 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) > > 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) > > $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null > > Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): > > 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) > 98 context-switches # 4.519 K/sec ( +- 0.07% ) > 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) > 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) > 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) > 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) > 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) > 66,742,892 instructions # 0.86 insn per cycle > # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) > 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) > 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) > > 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) > > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_64 fastdebug, `tier2` > - [x] Linux x86_64 fastdebug, `tier3` > - [x] Linux x86_64 fastdebug, `tier4` > - [x] Linux x86_32 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier2` > - [x] Linux x86_32 fastdebug, `tier3` This pull request has now been integrated. Changeset: 3a13425b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/3a13425bc9088cbb6d95e1a46248d7eba27fb1a6 Stats: 177 lines in 5 files changed: 155 ins; 4 del; 18 mod 8072070: Improve interpreter stack banging Reviewed-by: xliu, coleenp, mdoerr ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From duke at openjdk.java.net Fri Feb 11 08:52:48 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Fri, 11 Feb 2022 08:52:48 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v9] In-Reply-To: References: Message-ID: > Deprecated ExtendedDTraceProbes. > Edited help messages and man pages accordingly, added the 3 flags to man pages. > Added flag to VMDeprecatedOptions test. > Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. > > Checked that tests are not affected. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix in response to suggestion by David Holmes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7110/files - new: https://git.openjdk.java.net/jdk/pull/7110/files/af11b456..78d8e00a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7110&range=07-08 Stats: 9 lines in 2 files changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/7110.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7110/head:pull/7110 PR: https://git.openjdk.java.net/jdk/pull/7110 From lkorinth at openjdk.java.net Fri Feb 11 08:54:51 2022 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 11 Feb 2022 08:54:51 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc [v2] In-Reply-To: References: Message-ID: > Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: updating copyright ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7426/files - new: https://git.openjdk.java.net/jdk/pull/7426/files/6aaa1a3a..7d3e7a1b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7426&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7426&range=00-01 Stats: 59 lines in 59 files changed: 0 ins; 0 del; 59 mod Patch: https://git.openjdk.java.net/jdk/pull/7426.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7426/head:pull/7426 PR: https://git.openjdk.java.net/jdk/pull/7426 From lkorinth at openjdk.java.net Fri Feb 11 09:01:06 2022 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 11 Feb 2022 09:01:06 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 08:54:51 GMT, Leo Korinth wrote: >> Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > updating copyright I have a maven project that compiles test/lib and jtreg/gc, so everything changed does compile, I should have mentioned that. I have updated copyright year on all files now as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/7426 From kbarrett at openjdk.java.net Fri Feb 11 09:09:10 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 11 Feb 2022 09:09:10 GMT Subject: RFR: 8281626: NonblockingQueue should use nullptr In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 06:42:40 GMT, Aleksey Shipilev wrote: >> Please review this change to use nullptr instead of NULL throughout the NonblockingQueue class. >> >> Testing: >> mach5 tier1 > > Looks fine! Thanks @shipilev and @dholmes-ora for reviews. I waffled about suggesting it's trivial; I'll take your suggestion, and go ahead and push now. ------------- PR: https://git.openjdk.java.net/jdk/pull/7438 From kbarrett at openjdk.java.net Fri Feb 11 09:09:11 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 11 Feb 2022 09:09:11 GMT Subject: Integrated: 8281626: NonblockingQueue should use nullptr In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 05:37:27 GMT, Kim Barrett wrote: > Please review this change to use nullptr instead of NULL throughout the NonblockingQueue class. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 90939cb8 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/90939cb80193c671cae635b7a4e41bd2e6bcdbd5 Stats: 39 lines in 2 files changed: 0 ins; 0 del; 39 mod 8281626: NonblockingQueue should use nullptr Reviewed-by: shade, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/7438 From adinn at openjdk.java.net Fri Feb 11 09:59:13 2022 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 11 Feb 2022 09:59:13 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 17:24:41 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_64 fastdebug, `tier4` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Show watermark in better place on the chart ship it and be damned :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From sspitsyn at openjdk.java.net Fri Feb 11 10:14:09 2022 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Fri, 11 Feb 2022 10:14:09 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 08:54:51 GMT, Leo Korinth wrote: >> Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > updating copyright Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7426 From thartmann at openjdk.java.net Fri Feb 11 10:24:10 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 11 Feb 2022 10:24:10 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v9] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 08:52:48 GMT, Emanuel Peter wrote: >> Deprecated ExtendedDTraceProbes. >> Edited help messages and man pages accordingly, added the 3 flags to man pages. >> Added flag to VMDeprecatedOptions test. >> Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. >> >> Checked that tests are not affected. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix in response to suggestion by David Holmes Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7110 From sjohanss at openjdk.java.net Fri Feb 11 10:42:09 2022 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 11 Feb 2022 10:42:09 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v4] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Thu, 10 Feb 2022 17:23:42 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > lower case Looks good, even if I preferred the old use of `needs_expand()`. src/hotspot/share/gc/parallel/psOldGen.cpp line 180: > 178: bool needs_expand = > 179: pointer_delta(object_space()->end(), object_space()->top()) < word_size; > 180: if (needs_expand) { To me the old code reads better, but I guess it's a matter of taste. The predicate could be moved to PSOldGen to allow asserting that the lock is held. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7124 From mdoerr at openjdk.java.net Fri Feb 11 11:32:11 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 11 Feb 2022 11:32:11 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Tue, 8 Feb 2022 17:24:41 GMT, Aleksey Shipilev wrote: >> This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get `java.lang.invoke` infra work reasonably fast when cold, which includes lots of interpreter paths. >> >> The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the `native_call`, because we might as well bang the entire shadow zone in native case as well. >> >> This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why `native_call` argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration. >> >> I tried to capture the current mechanics of stack banging in `stackOverflow.hpp`, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around. >> >> I think it is fairly complete, and so would like to solicit more feedback and testing here. >> >> Point runs on SPECjvm2008 with `-Xint` shows huge improvements on half of the tests, without any regressions: >> >> >> compiler.compiler: +77% >> compiler.sunflow: +69% >> compress: +166% >> crypto.rsa: +15% >> crypto.signverify: +70% >> mpegaudio: +8% >> serial: +50% >> sunflow: +57% >> xml.transform: +61% >> xml.validation: +43% >> >> >> My new `java.lang.invoke` benchmarks improve a lot as well: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Mainline >> MHInvoke.methodHandle avgt 5 799.671 ? 9.087 ns/op >> MHInvoke.plain avgt 5 261.947 ? 1.421 ns/op >> VHGet.plain avgt 5 231.372 ? 3.044 ns/op >> VHGet.varHandle avgt 5 924.880 ? 6.026 ns/op >> >> # This WIP >> MHInvoke.methodHandle avgt 5 240.456 ? 3.931 ns/op >> MHInvoke.plain avgt 5 70.851 ? 1.986 ns/op >> VHGet.plain avgt 5 52.506 ? 3.768 ns/op >> VHGet.varHandle avgt 5 335.785 ? 4.398 ns/op >> >> >> It also palpably improves startup even on small HelloWorld, _even when compilers are present_: >> >> >> $ perf stat -r 5000 build/baseline/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/baseline/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 22.06 msec task-clock # 1.030 CPUs utilized ( +- 0.04% ) >> 96 context-switches # 4.353 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 333.181 /sec ( +- 0.32% ) >> 2,437 page-faults # 110.469 K/sec ( +- 0.00% ) >> 78,763,038 cycles # 3.571 GHz ( +- 0.05% ) (77.30%) >> 2,107,182 stalled-cycles-frontend # 2.68% frontend cycles idle ( +- 0.41% ) (77.40%) >> 2,235,371 stalled-cycles-backend # 2.84% backend cycles idle ( +- 1.05% ) (71.39%) >> 67,296,528 instructions # 0.85 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (89.79%) >> 12,483,022 branches # 565.911 M/sec ( +- 0.01% ) (99.73%) >> 384,412 branch-misses # 3.08% of all branches ( +- 0.07% ) (85.91%) >> >> 0.0214224 +- 0.0000875 seconds time elapsed ( +- 0.41% ) >> >> $ perf stat -r 5000 build/interp-bang/bin/java -Xms128m -Xmx128m Hello > /dev/null >> >> Performance counter stats for 'build/interp-bang/bin/java -Xms128m -Xmx128m Hello' (5000 runs): >> >> 21.78 msec task-clock # 1.031 CPUs utilized ( +- 0.05% ) >> 98 context-switches # 4.519 K/sec ( +- 0.07% ) >> 7 cpu-migrations # 339.292 /sec ( +- 0.31% ) >> 2,434 page-faults # 111.755 K/sec ( +- 0.00% ) >> 77,746,317 cycles # 3.569 GHz ( +- 0.05% ) (76.94%) >> 2,143,121 stalled-cycles-frontend # 2.76% frontend cycles idle ( +- 0.45% ) (76.03%) >> 2,059,440 stalled-cycles-backend # 2.65% backend cycles idle ( +- 1.11% ) (71.82%) >> 66,742,892 instructions # 0.86 insn per cycle >> # 0.03 stalled cycles per insn ( +- 0.03% ) (91.40%) >> 12,494,797 branches # 573.634 M/sec ( +- 0.01% ) (99.80%) >> 386,145 branch-misses # 3.09% of all branches ( +- 0.08% ) (85.56%) >> >> 0.0211278 +- 0.0000877 seconds time elapsed ( +- 0.42% ) >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1` >> - [x] Linux x86_64 fastdebug, `tier2` >> - [x] Linux x86_64 fastdebug, `tier3` >> - [x] Linux x86_64 fastdebug, `tier4` >> - [x] Linux x86_32 fastdebug, `tier1` >> - [x] Linux x86_32 fastdebug, `tier2` >> - [x] Linux x86_32 fastdebug, `tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Show watermark in better place on the chart Seems like Power is not affected by this TLB / Cache bottleneck. We use 64k pages and typically 2 store instructions for banging. On the other side, I think it's a good thing to avoid touching any storage which we don't need. So, we could overwork the PPC64 implementation, too (optionally). Or wait until more experiments have been made. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From shade at openjdk.java.net Fri Feb 11 11:35:20 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 11 Feb 2022 11:35:20 GMT Subject: RFR: 8072070: Improve interpreter stack banging [v5] In-Reply-To: References: <8sseq_si2gPMLJGfdJ33Icebfs_tAdFhPMB1Uszu3dI=.f5a439be-69aa-4aaf-8e0b-5ddf7865b376@github.com> Message-ID: On Fri, 11 Feb 2022 11:29:12 GMT, Martin Doerr wrote: > Seems like Power is not affected by this TLB / Cache bottleneck. We use 64k pages and typically 2 store instructions for banging. On the other side, I think it's a good thing to avoid touching any storage which we don't need. So, we could overwork the PPC64 implementation, too (optionally). Or wait until more experiments have been made. Yes, larger VM pages mean fewer addresses to touch. OTOH, in my related experiments with removing the stack banging on compiled entry whatsoever, we seem to redeem single-digit percent improvements, even though we only touch one location far away. Anyhow, I think a good plan is to wait and see if this x86 pilot change runs into any interesting problems, before translating it to other architectures. ------------- PR: https://git.openjdk.java.net/jdk/pull/7247 From duke at openjdk.java.net Fri Feb 11 11:37:56 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 11 Feb 2022 11:37:56 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v21] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: - Add comments to enter calls - Set PreserveFramePointer if use_rop_protection is set ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/f779513b..2062cce7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=20 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=19-20 Stats: 26 lines in 8 files changed: 16 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Fri Feb 11 11:37:56 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 11 Feb 2022 11:37:56 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v18] In-Reply-To: References: <8eyrOM5Brgjz4517k80s5RW3HhTDdhevVZOCS8jbIl0=.b41a377e-2235-4310-9b4c-e75e473eb236@github.com> <32e7_CnkkIaj2GOsvi9mT-xzgLO8B60uHrzMEAZXHko=.2ea9eaff-39c6-4401-9820-4536f03d5ec7@github.com> <9wCVZ8gCStf_tUT8_WQjhLzXqqQlQMsijeiBaAXDVVk=.aace6af6-bf1b-40c9-ba19-6fd0ab9b1b0a@github.com> Message-ID: <44zejAzpVh55H_lUbDPm3eCzG5NoUjvO2zJVQRZ83G8=.22ae70e3-76fb-4e33-9cae-41c4a4876a54@github.com> On Thu, 10 Feb 2022 16:32:25 GMT, Alan Hayward wrote: >> Status? Is branch protection really incompatible with PreserveFramePointer? > > Eventually found a missing signing in the exception handling. I'm running the full suite now, so should hopefully get something posted tomorrow. New patches fix the failures ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From redestad at openjdk.java.net Fri Feb 11 11:40:07 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 11 Feb 2022 11:40:07 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 12:51:31 GMT, Claes Redestad wrote: > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). > Hi Claes, it can get implemented similarly on PPC64: #7430 You can integrate it if you prefer that, but better after it got a Review. Hi Martin, perfect! Ideally we can get all platforms that has a `hasNegatives` intrinsic moved over so we can just switch it over big-bang style: remove the `@IntrinsicCandidate`, avoid contortions to pick the "right" implementation on the Java level based on which intrinsic is available and drop all VM-internal scaffolding for `hasNegatives`. Then it makes perfect sense to fold your patch into this PR, rather than have a tail of follow-ups. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From redestad at openjdk.java.net Fri Feb 11 12:11:54 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 11 Feb 2022 12:11:54 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: - Merge branch 'master' into count_positives - Restore partial vector checks in AVX2 and SSE intrinsic variants - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral - Simplify changes to encodeUTF8 - Fix little-endian error caught by testing - Reduce jumps in the ascii path - Remove unused tail_mask - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) - Add more comments, simplify tail branching in AVX512 variant - Resolve issues in the precise implementation - ... and 13 more: https://git.openjdk.java.net/jdk/compare/42073fce...c4bb3612 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7231/files - new: https://git.openjdk.java.net/jdk/pull/7231/files/2a855eb6..c4bb3612 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=00-01 Stats: 18287 lines in 533 files changed: 12765 ins; 2983 del; 2539 mod Patch: https://git.openjdk.java.net/jdk/pull/7231.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231 PR: https://git.openjdk.java.net/jdk/pull/7231 From dholmes at openjdk.java.net Fri Feb 11 12:50:07 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 11 Feb 2022 12:50:07 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v9] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 08:52:48 GMT, Emanuel Peter wrote: >> Deprecated ExtendedDTraceProbes. >> Edited help messages and man pages accordingly, added the 3 flags to man pages. >> Added flag to VMDeprecatedOptions test. >> Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. >> >> Checked that tests are not affected. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix in response to suggestion by David Holmes Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From dholmes at openjdk.java.net Fri Feb 11 13:01:04 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 11 Feb 2022 13:01:04 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 08:54:51 GMT, Leo Korinth wrote: >> Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > updating copyright Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7426 From mdoerr at openjdk.java.net Fri Feb 11 15:38:16 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 11 Feb 2022 15:38:16 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 12:11:54 GMT, Claes Redestad wrote: >> I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. >> >> Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 >> >> - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. >> >> - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. >> >> - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: > > - Merge branch 'master' into count_positives > - Restore partial vector checks in AVX2 and SSE intrinsic variants > - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral > - Simplify changes to encodeUTF8 > - Fix little-endian error caught by testing > - Reduce jumps in the ascii path > - Remove unused tail_mask > - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) > - Add more comments, simplify tail branching in AVX512 variant > - Resolve issues in the precise implementation > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/811eb365...c4bb3612 Hi Claes, doing it for all platforms and cleaning it up sounds good. My PPC64 contribution is already tested and reviewed. I'll try to find a volunteer for s390 which uses exactly the same algorithm as PPC64. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From smonteith at openjdk.java.net Fri Feb 11 15:43:07 2022 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Fri, 11 Feb 2022 15:43:07 GMT Subject: RFR: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed [v4] In-Reply-To: <2xky5BN4tTu-ZmRfO-Um0JjHBDQkJtnekjfT-ax6ZIg=.34e819ac-0e87-490f-a8a5-5fbe857bc9d6@github.com> References: <2xky5BN4tTu-ZmRfO-Um0JjHBDQkJtnekjfT-ax6ZIg=.34e819ac-0e87-490f-a8a5-5fbe857bc9d6@github.com> Message-ID: On Thu, 20 Jan 2022 15:58:09 GMT, Bhavana-Kilambi wrote: >> The product variable "PrefetchFieldsAhead" is defined in gc_globals.hpp and set in vm_version_x86.cpp. >> But as it's not used anywhere, removing this option from the JDK source. > > Bhavana-Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge master > - 8239927: Product variable PrefetchFieldsAhead is unused and should be removed Hello @dholmes-ora , would it be possible for you to give this a proper review now it has been CSR approved? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/6783 From redestad at openjdk.java.net Fri Feb 11 15:45:10 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 11 Feb 2022 15:45:10 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 12:11:54 GMT, Claes Redestad wrote: >> I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. >> >> Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 >> >> - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. >> >> - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. >> >> - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: > > - Merge branch 'master' into count_positives > - Restore partial vector checks in AVX2 and SSE intrinsic variants > - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral > - Simplify changes to encodeUTF8 > - Fix little-endian error caught by testing > - Reduce jumps in the ascii path > - Remove unused tail_mask > - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) > - Add more comments, simplify tail branching in AVX512 variant > - Resolve issues in the precise implementation > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/690b05fa...c4bb3612 Good! I'm currently reading up on aarch64 asm and trying to port that intrinsic over. It might take some time.. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From aph-open at littlepinkcloud.com Fri Feb 11 16:23:15 2022 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 11 Feb 2022 16:23:15 +0000 Subject: RFC: AArch64: Set Segmented CodeCache default size to 127M In-Reply-To: <64AB1C1E-4151-4979-BF15-CC71D00E98DB@amazon.com> References: <64AB1C1E-4151-4979-BF15-CC71D00E98DB@amazon.com> Message-ID: <155db069-9cdd-6e90-6e02-d87be2ab204b@littlepinkcloud.com> On 2/10/22 23:02, Astigeevich, Evgeny wrote: > We?d like to discuss a proposal for setting TieredCompilation Segmented CodeCache default size to 127M on AArch64 (https://bugs.openjdk.java.net/browse/JDK-8280150). I don't think so, at least not without a lot more information. This would halve the size of the code cache, potentially causing severe regressions in production. I have seen bug reports from customers mystified at poor OpenJDK performance which have turned out to be code cache thrashing. This is very hard to diagnose without making some inspired guesses at what the root cause may be. We'd be moving the threshold for cache exhaustion much closer to our default configuration. So, this is a trade off between a small expected gain and a much larger (but hopefully rare) loss. I'd like to see more information. What was the *average performance gain* of all your benchmarks? I don't think anyone is interested in cherry-picked best cases. A quick back-of-the-envelope calculation tells me that about 3.5% of the code cache is occupied by trampolines and the extra bytes used by far calls. However, many of the far calls are never needed; I don't have stats for that, but I'd guess about half of them. But given the (plausible ?) assumption that the dynamic frequency of calls is the same as the static frequency, I wouldn't be surprised if the cost of trampoline calls is about 2% of the total instruction count, so it'd be nice to be rid of them if there were no cost; but there is a cost. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at openjdk.java.net Fri Feb 11 16:42:16 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 11 Feb 2022 16:42:16 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v21] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 11:37:56 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: > > - Add comments to enter calls > - Set PreserveFramePointer if use_rop_protection is set src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 439: > 437: if (_rop_protection == true) { > 438: PreserveFramePointer = true; > 439: } You need an error message for -PreserveFramePointer +UseROPProtection. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From aph at openjdk.java.net Fri Feb 11 16:52:16 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 11 Feb 2022 16:52:16 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v21] In-Reply-To: References: Message-ID: <_8bU9rjmxZtiKw_7zHvR5kZxEGV0zPYsmLjwwzb78Eg=.41b11771-c173-4492-bcff-400a632a5ed1@github.com> On Fri, 11 Feb 2022 11:37:56 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request incrementally with two additional commits since the last revision: > > - Add comments to enter calls > - Set PreserveFramePointer if use_rop_protection is set This is looking pretty nice now. With the check for -XX:-UseFramePointer argument consistency we're done. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Fri Feb 11 17:12:15 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 11 Feb 2022 17:12:15 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v21] In-Reply-To: <_8bU9rjmxZtiKw_7zHvR5kZxEGV0zPYsmLjwwzb78Eg=.41b11771-c173-4492-bcff-400a632a5ed1@github.com> References: <_8bU9rjmxZtiKw_7zHvR5kZxEGV0zPYsmLjwwzb78Eg=.41b11771-c173-4492-bcff-400a632a5ed1@github.com> Message-ID: On Fri, 11 Feb 2022 16:48:33 GMT, Andrew Haley wrote: > This is looking pretty nice now. With the check for -XX:-UseFramePointer argument consistency we're done. Excellent! I'm away all next week, so will add the check when I get back. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From hseigel at openjdk.java.net Fri Feb 11 19:27:49 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 11 Feb 2022 19:27:49 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v4] In-Reply-To: References: Message-ID: <96SsHcTOA9N7kiXUsC4fgpqNYqcbh-CRbP31D7bNerg=.0d59fef5-83df-48a6-a04e-3f0819f076ba@github.com> > Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. > > This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: Use new ALLOW_CALL call macro ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7248/files - new: https://git.openjdk.java.net/jdk/pull/7248/files/dd1820eb..abb2b0ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=02-03 Stats: 91 lines in 6 files changed: 14 ins; 48 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/7248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7248/head:pull/7248 PR: https://git.openjdk.java.net/jdk/pull/7248 From dholmes at openjdk.java.net Sat Feb 12 01:22:19 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 12 Feb 2022 01:22:19 GMT Subject: RFR: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed [v4] In-Reply-To: <2xky5BN4tTu-ZmRfO-Um0JjHBDQkJtnekjfT-ax6ZIg=.34e819ac-0e87-490f-a8a5-5fbe857bc9d6@github.com> References: <2xky5BN4tTu-ZmRfO-Um0JjHBDQkJtnekjfT-ax6ZIg=.34e819ac-0e87-490f-a8a5-5fbe857bc9d6@github.com> Message-ID: On Thu, 20 Jan 2022 15:58:09 GMT, Bhavana-Kilambi wrote: >> The product variable "PrefetchFieldsAhead" is defined in gc_globals.hpp and set in vm_version_x86.cpp. >> But as it's not used anywhere, removing this option from the JDK source. > > Bhavana-Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge master > - 8239927: Product variable PrefetchFieldsAhead is unused and should be removed Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6783 From duke at openjdk.java.net Sat Feb 12 09:38:09 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Sat, 12 Feb 2022 09:38:09 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v9] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 12:46:51 GMT, David Holmes wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix in response to suggestion by David Holmes > > Marked as reviewed by dholmes (Reviewer). Thanks @dholmes-ora , @vnkozlov , @TobiHartmann @hseigel for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From aph-open at littlepinkcloud.com Sat Feb 12 10:23:55 2022 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 12 Feb 2022 10:23:55 +0000 Subject: RFC: AArch64: Set Segmented CodeCache default size to 127M In-Reply-To: <155db069-9cdd-6e90-6e02-d87be2ab204b@littlepinkcloud.com> References: <64AB1C1E-4151-4979-BF15-CC71D00E98DB@amazon.com> <155db069-9cdd-6e90-6e02-d87be2ab204b@littlepinkcloud.com> Message-ID: <3e317d33-7d48-93ae-3787-886797483d62@littlepinkcloud.com> On 2/11/22 16:23, Andrew Haley wrote: > A quick back-of-the-envelope calculation tells me that about 3.5% of > the code cache is occupied by trampolines and the extra bytes used by > far calls. However, many of the far calls s/far calls/trampolines/ > are never needed; I don't > have stats for that, but I'd guess about half of them. But given the > (plausible ?) assumption that the dynamic frequency of calls is the > same as the static frequency, I wouldn't be surprised if the cost of > trampoline calls is about 2% of the total instruction count, so it'd > be nice to be rid of them if there were no cost; but there is a cost. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From duke at openjdk.java.net Sat Feb 12 13:12:06 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Sat, 12 Feb 2022 13:12:06 GMT Subject: Integrated: 8278423: ExtendedDTraceProbes should be deprecated In-Reply-To: References: Message-ID: On Mon, 17 Jan 2022 13:08:17 GMT, Emanuel Peter wrote: > Deprecated ExtendedDTraceProbes. > Edited help messages and man pages accordingly, added the 3 flags to man pages. > Added flag to VMDeprecatedOptions test. > Replaced the flag with 3 flags in SDTProbesGNULinuxTest.java. > > Checked that tests are not affected. This pull request has now been integrated. Changeset: 67077a04 Author: Emanuel Peter Committer: David Holmes URL: https://git.openjdk.java.net/jdk/commit/67077a04307b512219a46b6c4c274ce308ee46de Stats: 36 lines in 5 files changed: 26 ins; 0 del; 10 mod 8278423: ExtendedDTraceProbes should be deprecated Reviewed-by: dholmes, hseigel, kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From dholmes at openjdk.java.net Sat Feb 12 13:53:07 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 12 Feb 2022 13:53:07 GMT Subject: RFR: 8278423: ExtendedDTraceProbes should be deprecated [v9] In-Reply-To: References: Message-ID: <1F1lA8O1LUcjG7jN9zKWggndsrjftGjyY9SNkGi0IQ0=.6980ebbb-58f5-4a08-97ae-ab4272e47e88@github.com> On Sat, 12 Feb 2022 09:34:28 GMT, Emanuel Peter wrote: >> Marked as reviewed by dholmes (Reviewer). > > Thanks @dholmes-ora , @vnkozlov , @TobiHartmann @hseigel for the reviews. @eme64 the test fails in our CI as it encounters builds for which DTrace is not enabled - see JDK-8281675. ------------- PR: https://git.openjdk.java.net/jdk/pull/7110 From jbhateja at openjdk.java.net Sun Feb 13 02:55:14 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 13 Feb 2022 02:55:14 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v2] In-Reply-To: <2TVKx_BFFyAK2ooOWKpdsEIMFzJngYxlWjbgeZ2y4Mc=.5deb2173-8107-476d-92ca-1835d69ce336@github.com> References: <2TVKx_BFFyAK2ooOWKpdsEIMFzJngYxlWjbgeZ2y4Mc=.5deb2173-8107-476d-92ca-1835d69ce336@github.com> Message-ID: On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: > The JVM currently initializes the x86 mxcsr to round to nearest even, see below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for Math.rint which is specified to be round to nearest even. Please see: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html : section 4.8.4 > > The rounding mode needed for Math.round is round to positive infinity which needs a different x86 mxcsr initialization(0x5F80). Hi @sviswa7 , As per JLS 17 section 15.4 Java follows round to nearest rounding policy for all floating point operations except conversion to integer and remainder where it uses round toward zero. So it may not be feasible to modify global MXCSR.RC setting, also modifying MXCSR setting just before rounding and re-setting back to its original value after operation will also not work as OOO processor is free to re-order LMXCSR instruction if used without any barriers and thus it may also influence other floating point operation. I am pushing an incremental patch which is vectorizes existing rounding APIs and is showing significant gain over existing implementation. Best Regards, Jatin ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Sun Feb 13 03:09:43 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 13 Feb 2022 03:09:43 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 - 8279508: Adding a test for scalar intrinsification. - 8279508: Auto-vectorize Math.round API ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/575d2935..2dc364fa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=01-02 Stats: 33695 lines in 1192 files changed: 23243 ins; 5703 del; 4749 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Sun Feb 13 03:23:06 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sun, 13 Feb 2022 03:23:06 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 >> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Adding a test for scalar intrinsification. > - 8279508: Auto-vectorize Math.round API Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Sun Feb 13 05:18:34 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sun, 13 Feb 2022 05:18:34 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3] In-Reply-To: References: Message-ID: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> > Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: missing ForceInline ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7358/files - new: https://git.openjdk.java.net/jdk/pull/7358/files/8028be52..cf78527b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7358&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7358&range=01-02 Stats: 10 lines in 2 files changed: 6 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7358/head:pull/7358 PR: https://git.openjdk.java.net/jdk/pull/7358 From duke at openjdk.java.net Sun Feb 13 05:18:36 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sun, 13 Feb 2022 05:18:36 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 18:55:29 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - minor rename >> - address reviews > > Observing the following failures on CPUs with "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags: > > -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation > > > TestVectorCastAVX512.java: > > Failed IR Rules (1) > ------------------ > - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])": > * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastI2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" > - counts: Graph contains wrong number of nodes: > Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*) > Expected 1 but found 0 nodes. > > > TestVectorCastAVX1.java: > > - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])": > * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastB2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" > - counts: Graph contains wrong number of nodes: > Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) > Expected 1 but found 0 nodes. > > - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])": > * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastB2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" > - counts: Graph contains wrong number of nodes: > Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) > Expected 1 but found 0 nodes. @PaulSandoz Thanks a lot for your testing, the reason seems to be due to `LaneType::asIntegral` missing `ForceInline` annotation. I have run the reshape test 10 times without getting any failure while with previous patch there is often 1 or 2. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From duke at openjdk.java.net Sun Feb 13 08:39:11 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sun, 13 Feb 2022 08:39:11 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 >> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Adding a test for scalar intrinsification. > - 8279508: Auto-vectorize Math.round API Also, it seems you have tried using `roundss/sd/ps/pd` followed by a cast to correct the rounding behaviour but decided to take another approach. Some comments around the functions explaining why that is so would be preferable. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From aph at openjdk.java.net Sun Feb 13 11:01:06 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 13 Feb 2022 11:01:06 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 03:09:43 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 >> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Adding a test for scalar intrinsification. > - 8279508: Auto-vectorize Math.round API src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4066: > 4064: } > 4065: > 4066: void C2_MacroAssembler::vector_cast_double_special_cases_evex(XMMRegister dst, XMMRegister src, XMMRegister xtmp1, What does this do? Comment, even pseudo code, would be nice. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Sun Feb 13 13:12:07 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 13 Feb 2022 13:12:07 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 10:58:19 GMT, Andrew Haley wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 >> - 8279508: Adding a test for scalar intrinsification. >> - 8279508: Auto-vectorize Math.round API > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4066: > >> 4064: } >> 4065: >> 4066: void C2_MacroAssembler::vector_cast_double_special_cases_evex(XMMRegister dst, XMMRegister src, XMMRegister xtmp1, > > What does this do? Comment, even pseudo code, would be nice. > Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. Hi @merykitty , You are correct, we can embed RC mode in instruction encoding round instructions (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values also are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Sun Feb 13 13:16:16 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sun, 13 Feb 2022 13:16:16 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 13:08:41 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4066: >> >>> 4064: } >>> 4065: >>> 4066: void C2_MacroAssembler::vector_cast_double_special_cases_evex(XMMRegister dst, XMMRegister src, XMMRegister xtmp1, >> >> What does this do? Comment, even pseudo code, would be nice. > >> Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. > > Hi @merykitty , You are correct, we can embed RC mode in instruction encoding of round instruction (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values also are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects. > What does this do? Comment, even pseudo code, would be nice. Thanks @theRealAph , I shall append the comments over the routine. BTW, entire rounding algorithm can also be implemented using Vector API which can perform if-conversion using masked operations. class roundf { public static VectorSpecies ISPECIES = IntVector.SPECIES_512; public static VectorSpecies SPECIES = FloatVector.SPECIES_512; public static int round_vector(float[] a, int[] r, int ctr) { IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127); for (int i = 0; i < a.length; i += SPECIES.length()) { FloatVector fv = FloatVector.fromArray(SPECIES, a, i); IntVector iv = fv.reinterpretAsInts(); IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000); biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23); IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, biasedExpV); VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32) .compare(VectorOperators.EQ, 0); IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF) .lanewise(VectorOperators.OR, 0x007FFFFF + 1); VectorMask cond1 = iv.compare(VectorOperators.LT, 0); VectorMask cond2 = cond1.and(cond); res = res.lanewise(VectorOperators.NEG, cond2); res = res.lanewise(VectorOperators.ASHR, shiftV) .lanewise(VectorOperators.ADD, 1) .lanewise(VectorOperators.ASHR, 1); res = fv.convert(VectorOperators.F2I, 0) .reinterpretAsInts() .blend(res, cond); res.intoArray(r, i); } return r[ctr]; } ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Mon Feb 14 01:37:09 2022 From: duke at openjdk.java.net (Bhavana-Kilambi) Date: Mon, 14 Feb 2022 01:37:09 GMT Subject: Integrated: 8239927: Product variable PrefetchFieldsAhead is unused and should be removed In-Reply-To: References: Message-ID: On Thu, 9 Dec 2021 11:51:05 GMT, Bhavana-Kilambi wrote: > The product variable "PrefetchFieldsAhead" is defined in gc_globals.hpp and set in vm_version_x86.cpp. > But as it's not used anywhere, removing this option from the JDK source. This pull request has now been integrated. Changeset: adbe0661 Author: Bhavana Kilambi Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/adbe0661029f12a36a44af52b83b189384d33a27 Stats: 13 lines in 3 files changed: 1 ins; 10 del; 2 mod 8239927: Product variable PrefetchFieldsAhead is unused and should be removed Reviewed-by: njian, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/6783 From ioi.lam at oracle.com Mon Feb 14 06:07:16 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Sun, 13 Feb 2022 22:07:16 -0800 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <5d25e7ceeabd9186dd6fe5e9e6e04d0d11ef26c0.camel@redhat.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> <5d25e7ceeabd9186dd6fe5e9e6e04d0d11ef26c0.camel@redhat.com> Message-ID: <3a76d11a-6816-5179-5a32-fd87e94ae90a@oracle.com> On 2/8/2022 3:32 AM, Severin Gehwolf wrote: > On Mon, 2022-02-07 at 22:29 -0800, Ioi Lam wrote: >> On 2022/02/07 10:36, Severin Gehwolf wrote: >>> On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote: >>>> Case (4) is the cause for the bug in JDK-8279484 >>>> >>>> Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares to 2. >>>> This means: >>>> >>>> - This container is guaranteed a minimum amount of CPU resources >>>> - If no other containers are executing, this container can use as >>>> ??? much CPU as available on the host >>>> - If other containers are executing, the amount of CPU available >>>> ??? to this container is (2 / (sum of cpu.shares of all active >>>> ??? containers)) >>>> >>>> >>>> The fundamental problem with the current JVM implementation is that it >>>> treats "CPU request" as a maximum value, the opposite of what Kubernetes >>>> does. Because of this, in case (4), the JVM artificially limits itself >>>> to a single CPU. This leads to CPU underutilization. >>> I agree with your analysis. Key point is that in such a setup >>> Kubernetes sets CPU shares value to 2. Though, it's a very specific >>> case. >>> >>> In contrast to Kubernetes the JVM doesn't have insight into what other >>> containers are doing (or how they are configured). It would, perhaps, >>> be good to know what Kubernetes does for containers when the >>> environment (i.e. other containers) changes. Do they get restarted? >>> Restarted with different values for cpu shares? >> My understanding is that Kubernetes will try to do load balancing and >> may migrate the containers. According to this: >> >> https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit >> >> If you change the CPU limits, a currently running container will be shut >> down and restarted (using the new limit), and may be relocated to a >> different host if necessary. >> >> I think this means that a JVM process doesn't need to worry about the >> CPU limit changing during its lifetime :-) >>> Either way, what are our options to fix this? Does it need fixing? >>> >>> ? * Should we no longer take cpu shares as a means to limit CPU into >>> ??? account? It would be a significant change to how previous JDKs >>> ??? worked. Maybe that wouldn't be such a bad idea :) >> I think we should get rid of it. This feature was designed to work with >> Kubernetes, but has no effect in most cases. The only time it takes >> effect (when no resource limits are set) it does the opposite of what >> the user expects. > I tend to agree. We should start with a CSR review of this, though, as > it would be a behavioural change as compared to previous versions of > the JDK. Hi Severin, Sorry for the delay. I've created a CSR. Could you take a look? https://bugs.openjdk.java.net/browse/JDK-8281571 > >> Also, the current implementation is really tied to specific behaviors of >> Kubernetes + docker (the 1024 and 100 constants). This will cause >> problems with other container/orchestration software that use different >> algorithms and constants. > There are other container orchestration frameworks, like Mesos, which > behave in a similar way (1024 constant is being used). The good news is > that mesos seems to have moved to a hard-limit default. See: > > https://mesos.apache.org/documentation/latest/quota/#deprecated-quota-guarantees > >>> ? * How likely is CPU underutilization to happen in practise? >>> ??? Considering the container is not the only container on the node, >>> ??? then according to your formula, it'll get one CPU or less anyway. >>> ??? Underutilization would, thus, only happen when it's an idle node >>> ??? with no other containers running. That would suggest to do nothing >>> ??? and let the user override it as they see fit. >> I think under utilization happens when the containers have a bursty >> usage pattern. If other containers do not fully utilize their CPU >> quotas, we should distribute the unused CPUs to the busy containers. > Right, but this isn't really something the JVM process should care > about. It's really a core feature of the orchestration framework to do > that. All we could do is to not limit CPU for those cases. On the other > hand there is the risk of resource starvation too. Consider a node with > many cores, 50 say, and a very small cpu share setting via container > limits. The experience running a JVM application in such a set up would > be very mediocre as the JVM thinks it can use 50 cores (100% of the > time), yet it would only get this when the rest of the > containers/universe is idle. I think we have a general problem that's not specific to containers. If we are running 50 active Java processes on a bare-bone Linux, then each of them would be default use? a 50-thread ForkJoinPool. In each process is given an equal amount of CPU resources, it would make sense for each of them to have a single thread FJP so we can avoid all thread context switching. Or, maybe the Linux kernel is already good enough? If each process is bound to a single physical CPU, context switching between the threads of the same process should be pretty lightweight. It would be worthwhile writing a test case .... Thanks - Ioi > > Thanks, > Severin > From david.holmes at oracle.com Mon Feb 14 07:02:17 2022 From: david.holmes at oracle.com (David Holmes) Date: Mon, 14 Feb 2022 17:02:17 +1000 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <3a76d11a-6816-5179-5a32-fd87e94ae90a@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> <5d25e7ceeabd9186dd6fe5e9e6e04d0d11ef26c0.camel@redhat.com> <3a76d11a-6816-5179-5a32-fd87e94ae90a@oracle.com> Message-ID: <0d081302-9dfb-3e48-13c0-8ee151bfb626@oracle.com> On 14/02/2022 4:07 pm, Ioi Lam wrote: > On 2/8/2022 3:32 AM, Severin Gehwolf wrote: >> On Mon, 2022-02-07 at 22:29 -0800, Ioi Lam wrote: >>> On 2022/02/07 10:36, Severin Gehwolf wrote: >>>> On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote: >>>>> Case (4) is the cause for the bug in JDK-8279484 >>>>> >>>>> Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and cpu.shares >>>>> to 2. >>>>> This means: >>>>> >>>>> - This container is guaranteed a minimum amount of CPU resources >>>>> - If no other containers are executing, this container can use as >>>>> ???? much CPU as available on the host >>>>> - If other containers are executing, the amount of CPU available >>>>> ???? to this container is (2 / (sum of cpu.shares of all active >>>>> ???? containers)) >>>>> >>>>> >>>>> The fundamental problem with the current JVM implementation is that it >>>>> treats "CPU request" as a maximum value, the opposite of what >>>>> Kubernetes >>>>> does. Because of this, in case (4), the JVM artificially limits itself >>>>> to a single CPU. This leads to CPU underutilization. >>>> I agree with your analysis. Key point is that in such a setup >>>> Kubernetes sets CPU shares value to 2. Though, it's a very specific >>>> case. >>>> >>>> In contrast to Kubernetes the JVM doesn't have insight into what other >>>> containers are doing (or how they are configured). It would, perhaps, >>>> be good to know what Kubernetes does for containers when the >>>> environment (i.e. other containers) changes. Do they get restarted? >>>> Restarted with different values for cpu shares? >>> My understanding is that Kubernetes will try to do load balancing and >>> may migrate the containers. According to this: >>> >>> https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit >>> >>> >>> If you change the CPU limits, a currently running container will be shut >>> down and restarted (using the new limit), and may be relocated to a >>> different host if necessary. >>> >>> I think this means that a JVM process doesn't need to worry about the >>> CPU limit changing during its lifetime :-) >>>> Either way, what are our options to fix this? Does it need fixing? >>>> >>>> ?? * Should we no longer take cpu shares as a means to limit CPU into >>>> ???? account? It would be a significant change to how previous JDKs >>>> ???? worked. Maybe that wouldn't be such a bad idea :) >>> I think we should get rid of it. This feature was designed to work with >>> Kubernetes, but has no effect in most cases. The only time it takes >>> effect (when no resource limits are set) it does the opposite of what >>> the user expects. >> I tend to agree. We should start with a CSR review of this, though, as >> it would be a behavioural change as compared to previous versions of >> the JDK. > > Hi Severin, > > Sorry for the delay. I've created a CSR. Could you take a look? > > https://bugs.openjdk.java.net/browse/JDK-8281571 > >> >>> Also, the current implementation is really tied to specific behaviors of >>> Kubernetes + docker (the 1024 and 100 constants). This will cause >>> problems with other container/orchestration software that use different >>> algorithms and constants. >> There are other container orchestration frameworks, like Mesos, which >> behave in a similar way (1024 constant is being used). The good news is >> that mesos seems to have moved to a hard-limit default. See: >> >> https://mesos.apache.org/documentation/latest/quota/#deprecated-quota-guarantees >> >> >>>> ?? * How likely is CPU underutilization to happen in practise? >>>> ???? Considering the container is not the only container on the node, >>>> ???? then according to your formula, it'll get one CPU or less anyway. >>>> ???? Underutilization would, thus, only happen when it's an idle node >>>> ???? with no other containers running. That would suggest to do nothing >>>> ???? and let the user override it as they see fit. >>> I think under utilization happens when the containers have a bursty >>> usage pattern. If other containers do not fully utilize their CPU >>> quotas, we should distribute the unused CPUs to the busy containers. >> Right, but this isn't really something the JVM process should care >> about. It's really a core feature of the orchestration framework to do >> that. All we could do is to not limit CPU for those cases. On the other >> hand there is the risk of resource starvation too. Consider a node with >> many cores, 50 say, and a very small cpu share setting via container >> limits. The experience running a JVM application in such a set up would >> be very mediocre as the JVM thinks it can use 50 cores (100% of the >> time), yet it would only get this when the rest of the >> containers/universe is idle. > > I think we have a general problem that's not specific to containers. If > we are running 50 active Java processes on a bare-bone Linux, then each > of them would be default use? a 50-thread ForkJoinPool. In each process > is given an equal amount of CPU resources, it would make sense for each > of them to have a single thread FJP so we can avoid all thread context > switching. The JVM cannot optimise this situation because it has no knowledge of the system, its load, or the workload characteristics. It also doesn't know how the scheduler may apportion CPU resources. Sizing heuristics within the JDK itself are pretty basic. If the user/deployer has better knowledge of what would constitute an "optimum" configuration then they have control knobs (system properties, VM flags) they can use to implement that. > Or, maybe the Linux kernel is already good enough? If each process is > bound to a single physical CPU, context switching between the threads of > the same process should be pretty lightweight. It would be worthwhile > writing a test case .... Binding a process to a single CPU would be potentially very bad for some workloads. Neither end-point is likely to be "best" in general. Cheers, David > > Thanks > - Ioi > > >> >> Thanks, >> Severin >> > From shade at openjdk.java.net Mon Feb 14 08:06:19 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 14 Feb 2022 08:06:19 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 18:19:00 GMT, Aleksey Shipilev wrote: > I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. > > Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. > > The default values for options are different per platform, so tests are x86_64 specific. > > No default value is changed, this only unblocks experiments. > > Additional testing: > - [x] New tests on Linux x86_64 fastdebug > - [x] New tests on Linux x86_64 release Anyone? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From aph at openjdk.java.net Mon Feb 14 09:16:10 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 14 Feb 2022 09:16:10 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 13:12:35 GMT, Jatin Bhateja wrote: >>> Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. >> >> Hi @merykitty , You are correct, we can embed RC mode in instruction encoding of round instruction (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects. > >> What does this do? Comment, even pseudo code, would be nice. > > Thanks @theRealAph , I shall append the comments over the routine. > BTW, entire rounding algorithm can also be implemented using Vector API which can perform if-conversion using masked operations. > > class roundf { > public static VectorSpecies ISPECIES = IntVector.SPECIES_512; > public static VectorSpecies SPECIES = FloatVector.SPECIES_512; > > public static int round_vector(float[] a, int[] r, int ctr) { > IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127); > for (int i = 0; i < a.length; i += SPECIES.length()) { > FloatVector fv = FloatVector.fromArray(SPECIES, a, i); > IntVector iv = fv.reinterpretAsInts(); > IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000); > biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23); > IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, biasedExpV); > VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32) > .compare(VectorOperators.EQ, 0); > IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF) > .lanewise(VectorOperators.OR, 0x007FFFFF + 1); > VectorMask cond1 = iv.compare(VectorOperators.LT, 0); > VectorMask cond2 = cond1.and(cond); > res = res.lanewise(VectorOperators.NEG, cond2); > res = res.lanewise(VectorOperators.ASHR, shiftV) > .lanewise(VectorOperators.ADD, 1) > .lanewise(VectorOperators.ASHR, 1); > res = fv.convert(VectorOperators.F2I, 0) > .reinterpretAsInts() > .blend(res, cond); > res.intoArray(r, i); > } > return r[ctr]; > } That pseudocode would make a very useful comment too. This whole patch is very thinly commented. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From lkorinth at openjdk.java.net Mon Feb 14 12:08:13 2022 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 14 Feb 2022 12:08:13 GMT Subject: Integrated: 8281585: Remove unused imports under test/lib and jtreg/gc In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 15:39:53 GMT, Leo Korinth wrote: > Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. This pull request has now been integrated. Changeset: 2604a88f Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/2604a88fbb6d0f9aec51c7d607ea275bc34a672c Stats: 151 lines in 60 files changed: 0 ins; 92 del; 59 mod 8281585: Remove unused imports under test/lib and jtreg/gc Reviewed-by: dholmes, sspitsyn ------------- PR: https://git.openjdk.java.net/jdk/pull/7426 From lkorinth at openjdk.java.net Mon Feb 14 12:08:12 2022 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 14 Feb 2022 12:08:12 GMT Subject: RFR: 8281585: Remove unused imports under test/lib and jtreg/gc [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 08:54:51 GMT, Leo Korinth wrote: >> Remove unused imports under test/lib and jtreg/gc. They create lots of warnings if editing using an IDE. Tests in hotspot_gc passed. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > updating copyright Thanks David and Serguei! ------------- PR: https://git.openjdk.java.net/jdk/pull/7426 From volker.simonis at gmail.com Mon Feb 14 13:42:15 2022 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 14 Feb 2022 14:42:15 +0100 Subject: Internal compiler error for slowdebug build with gcc 7.5.0 on Ubuntu 18.04 In-Reply-To: References: Message-ID: I found the root cause of this issue. On my machine it was caused by the fact that I've installed version `release-4.6` of systemtap like so: ``` $ git clone git://sourceware.org/git/systemtap.git $ cd systemtap/ $ git checkout release-4.6 $ ./configure && make // no errors $ sudo make install ``` This leads to the described GCC internal error (even with GCC 10.3.0): ``` gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~18.04~1) ... during RTL pass: reload /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp: In static member function ?static void CompileBroker::invoke_compiler_on_method(CompileTask*)?: /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp:2415:1: internal compiler error: maximum number of generated reload insns per insn achieved (90) 2415 | } | ^ Please submit a full bug report, ``` By uninstalling systemtap or by upgrading to a newer version (after `sys/sdt.h fp constraints cont'd, x86-64 edition` [1]) the problem goes away. That systemtap change works around the yet unfixed GCC bug `2028798 - gcc: reload failures on x86-64 after Systemtap 4.6 upgrade ` [2] described before. All very strange and maybe another argument for deprecating DTRACE support [3]? [1] https://sourceware.org/git/?p=systemtap.git;a=commit;h=1d3653936fc1fd13135a723a27e6c7e959793ad0 [2] https://bugzilla.redhat.com/show_bug.cgi?id=2028798 [3] https://bugs.openjdk.java.net/browse/JDK-8278423 On Thu, Feb 10, 2022 at 4:08 PM Volker Simonis wrote: > > Hi, > > When compiling the latest HS sources in slowdebug mode with gcc 7.5.0 > (the default compiler on Ubuntu 18.04) I get the following internal > compiler error for the file compileBroker.cpp: > > /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp: In > static member function 'static voi > d CompileBroker::invoke_compiler_on_method(CompileTask*)': > /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp:2393:1: > internal compiler error: Max. number of generated reload insns per > insn is achieved (90) > > } > ^ > Please submit a full bug report, > with preprocessed source if appropriate. > See for instructions. > > I know that gcc 7.5.0 isn't officially supported but was just curious > if somebody has seen this before? Googling around shows that this > issue seems to have been fixed several times in gcc 4.9 and > specifically for ppc/rs6000. > > I've installed and tried gcc 8.4.0 but the error remains the same: > > GNU C++14 (Ubuntu 8.4.0-1ubuntu1~18.04) version 8.4.0 (x86_64-linux-gnu) > compiled by GNU C version 8.4.0, GMP version 6.1.2, MPFR version > 4.0.1, MPC version 1.1.0, isl version isl-0.19-GMP > > GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 > GNU assembler version 2.30 (x86_64-linux-gnu) using BFD version (GNU > Binutils for Ubuntu) 2.30 > Compiler executable checksum: 67fba09f596cc8a67df33f8529603bfb > during RTL pass: reload > /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp: In > static member function ?static void > CompileBroker::invoke_compiler_on_method(CompileTask*)?: > /OpenJDK/Git/jdk/src/hotspot/share/compiler/compileBroker.cpp:2393:1: > internal compiler error: Max. number of generated reload insns per > insn is achieved (90) > > } > ^ > Please submit a full bug report, > with preprocessed source if appropriate. > See for instructions. > > According to the "Supported Build Platforms" Wiki [1] it seems that at > least SAP is using gcc 8. Have you run into this issue as well? Any > ideas how to fix it without upgrading to gcc 10? > > Thank you and best regards, > Volker > > PS: the release build works perfectly fine with gcc 7.5.0 > > [1] https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms From vlivanov at openjdk.java.net Mon Feb 14 13:58:30 2022 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 14 Feb 2022 13:58:30 GMT Subject: RFR: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint Message-ID: MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. Testing: test/jdk/java/foreign/ w/ -Xint ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/7459/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7459&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8280901 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7459.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7459/head:pull/7459 PR: https://git.openjdk.java.net/jdk/pull/7459 From mcimadamore at openjdk.java.net Mon Feb 14 14:20:09 2022 From: mcimadamore at openjdk.java.net (Maurizio Cimadamore) Date: Mon, 14 Feb 2022 14:20:09 GMT Subject: RFR: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 13:40:32 GMT, Vladimir Ivanov wrote: > MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. > > Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. > > Testing: test/jdk/java/foreign/ w/ -Xint Thanks for the fix - maybe consider adding some extra test combinations in TestMatrix (this test is not automated, so it's not run by build and test infra). ------------- PR: https://git.openjdk.java.net/jdk/pull/7459 From shade at openjdk.java.net Mon Feb 14 17:08:36 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 14 Feb 2022 17:08:36 GMT Subject: RFR: 8281744: x86: Use short jumps in TIG::set_vtos_entry_points Message-ID: Performance in `-Xint` mode seems to be bottlenecked on the code size, rather than particular instruction hotspots, which means code density is important. There are forward branches in `TemplateInterpreterGenerator::set_vtos_entry_points`, which cannot be shortened by `MacroAssembler`, unless we tell it specifically that the upcoming branch target would be within the 8-bit offset. Which it apparently is in this particular case, because there are just a handful of `push`-es between the jump and its target. If a jump offset is more than 8 bits, the interpreter would catch fire just about everywhere, since `set_vtos_entry_points` is used at every bytecode entry. `fastdebug` builds assert the offset sanity directly. Current patch improves `SPECjvm2008:serial` performance in `-Xint` mode for about 7% on Ryzen 7 5700G. (More perf runs pending). There are other places in template interpreter where forward jumps can be short, I'll do them separately, since they are riskier and also less important. Additional testing: - [x] Linux x86_64 fastdebug, `tier1` - [x] Linux x86_32 fastdebug, `tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/7463/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7463&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281744 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/7463.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7463/head:pull/7463 PR: https://git.openjdk.java.net/jdk/pull/7463 From jbhateja at openjdk.java.net Mon Feb 14 17:18:07 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 14 Feb 2022 17:18:07 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 09:12:54 GMT, Andrew Haley wrote: >>> What does this do? Comment, even pseudo code, would be nice. >> >> Thanks @theRealAph , I shall append the comments over the routine. >> BTW, entire rounding algorithm can also be implemented using Vector API which can perform if-conversion using masked operations. >> >> class roundf { >> public static VectorSpecies ISPECIES = IntVector.SPECIES_512; >> public static VectorSpecies SPECIES = FloatVector.SPECIES_512; >> >> public static int round_vector(float[] a, int[] r, int ctr) { >> IntVector shiftVBC = (IntVector) ISPECIES.broadcast(24 - 2 + 127); >> for (int i = 0; i < a.length; i += SPECIES.length()) { >> FloatVector fv = FloatVector.fromArray(SPECIES, a, i); >> IntVector iv = fv.reinterpretAsInts(); >> IntVector biasedExpV = iv.lanewise(VectorOperators.AND, 0x7F800000); >> biasedExpV = biasedExpV.lanewise(VectorOperators.ASHR, 23); >> IntVector shiftV = shiftVBC.lanewise(VectorOperators.SUB, biasedExpV); >> VectorMask cond = shiftV.lanewise(VectorOperators.AND, -32) >> .compare(VectorOperators.EQ, 0); >> IntVector res = iv.lanewise(VectorOperators.AND, 0x007FFFFF) >> .lanewise(VectorOperators.OR, 0x007FFFFF + 1); >> VectorMask cond1 = iv.compare(VectorOperators.LT, 0); >> VectorMask cond2 = cond1.and(cond); >> res = res.lanewise(VectorOperators.NEG, cond2); >> res = res.lanewise(VectorOperators.ASHR, shiftV) >> .lanewise(VectorOperators.ADD, 1) >> .lanewise(VectorOperators.ASHR, 1); >> res = fv.convert(VectorOperators.F2I, 0) >> .reinterpretAsInts() >> .blend(res, cond); >> res.intoArray(r, i); >> } >> return r[ctr]; >> } > > That pseudocode would make a very useful comment too. This whole patch is very thinly commented. > > Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. > > Hi @merykitty , You are correct, we can embed RC mode in instruction encoding of round instruction (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects. **Just want to correct above statement, LDMXCSR will not be re-ordered/re-scheduled early OOO backend.** ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From shade at openjdk.java.net Mon Feb 14 17:19:09 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 14 Feb 2022 17:19:09 GMT Subject: RFR: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 13:40:32 GMT, Vladimir Ivanov wrote: > MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. > > Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. > > Testing: test/jdk/java/foreign/ w/ -Xint Looks fine to me, thanks for fixing! ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7459 From psandoz at openjdk.java.net Mon Feb 14 19:35:10 2022 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 14 Feb 2022 19:35:10 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3] In-Reply-To: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> References: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> Message-ID: On Sun, 13 Feb 2022 05:18:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > missing ForceInline Marked as reviewed by psandoz (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From psandoz at openjdk.java.net Mon Feb 14 19:35:10 2022 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 14 Feb 2022 19:35:10 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2] In-Reply-To: References: Message-ID: On Sun, 13 Feb 2022 05:14:47 GMT, Quan Anh Mai wrote: >> Observing the following failures on CPUs with "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags: >> >> -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation >> >> >> TestVectorCastAVX512.java: >> >> Failed IR Rules (1) >> ------------------ >> - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])": >> * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastI2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" >> - counts: Graph contains wrong number of nodes: >> Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*) >> Expected 1 but found 0 nodes. >> >> >> TestVectorCastAVX1.java: >> >> - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])": >> * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastB2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" >> - counts: Graph contains wrong number of nodes: >> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) >> Expected 1 but found 0 nodes. >> >> - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])": >> * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(\\\\d+(\\\\s){2}(VectorUCastB2X.*)+(\\\\s){2}===.*)", "1"}, applyIfNot={})" >> - counts: Graph contains wrong number of nodes: >> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) >> Expected 1 but found 0 nodes. > > @PaulSandoz Thanks a lot for your testing, the reason seems to be due to `LaneType::asIntegral` missing `ForceInline` annotation. I have run the reshape test 10 times without getting any failure while with previous patch there is often 1 or 2. > Thanks. @merykitty testing now passes. Java bits look good. Needs HotSpot reviewer. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From hseigel at openjdk.java.net Mon Feb 14 19:48:46 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 14 Feb 2022 19:48:46 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v5] In-Reply-To: References: Message-ID: > Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. > > This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: rename macro, fix semi-colon issue, fix zero lseek64 and ftruncate64 build issue ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7248/files - new: https://git.openjdk.java.net/jdk/pull/7248/files/abb2b0ac..d062fb50 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=03-04 Stats: 25 lines in 6 files changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/7248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7248/head:pull/7248 PR: https://git.openjdk.java.net/jdk/pull/7248 From rehn at openjdk.java.net Mon Feb 14 20:05:11 2022 From: rehn at openjdk.java.net (Robbin Ehn) Date: Mon, 14 Feb 2022 20:05:11 GMT Subject: RFR: 8281744: x86: Use short jumps in TIG::set_vtos_entry_points In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 15:47:41 GMT, Aleksey Shipilev wrote: > Performance in `-Xint` mode seems to be bottlenecked on the code size, rather than particular instruction hotspots, which means code density is important. > > There are forward branches in `TemplateInterpreterGenerator::set_vtos_entry_points`, which cannot be shortened by `MacroAssembler`, unless we tell it specifically that the upcoming branch target would be within the 8-bit offset. Which it apparently is in this particular case, because there are just a handful of `push`-es between the jump and its target. If a jump offset is more than 8 bits, the interpreter would catch fire just about everywhere, since `set_vtos_entry_points` is used at every bytecode entry. `fastdebug` builds assert the offset sanity directly. > > Current patch improves `SPECjvm2008:serial` performance in `-Xint` mode for about 7% on Ryzen 7 5700G. (More perf runs pending). > > There are other places in template interpreter where forward jumps can be short, I'll do them separately, since they are riskier and also less important. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier1` Thanks ------------- Marked as reviewed by rehn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7463 From coleenp at openjdk.java.net Mon Feb 14 20:37:10 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 14 Feb 2022 20:37:10 GMT Subject: RFR: 8281744: x86: Use short jumps in TIG::set_vtos_entry_points In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 15:47:41 GMT, Aleksey Shipilev wrote: > Performance in `-Xint` mode seems to be bottlenecked on the code size, rather than particular instruction hotspots, which means code density is important. > > There are forward branches in `TemplateInterpreterGenerator::set_vtos_entry_points`, which cannot be shortened by `MacroAssembler`, unless we tell it specifically that the upcoming branch target would be within the 8-bit offset. Which it apparently is in this particular case, because there are just a handful of `push`-es between the jump and its target. If a jump offset is more than 8 bits, the interpreter would catch fire just about everywhere, since `set_vtos_entry_points` is used at every bytecode entry. `fastdebug` builds assert the offset sanity directly. > > Current patch improves `SPECjvm2008:serial` performance in `-Xint` mode for about 7% on Ryzen 7 5700G. (More perf runs pending). > > There are other places in template interpreter where forward jumps can be short, I'll do them separately, since they are riskier and also less important. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier1` Looks good ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7463 From kvn at openjdk.java.net Mon Feb 14 21:14:10 2022 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 14 Feb 2022 21:14:10 GMT Subject: RFR: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 13:40:32 GMT, Vladimir Ivanov wrote: > MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. > > Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. > > Testing: test/jdk/java/foreign/ w/ -Xint Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7459 From dholmes at openjdk.java.net Mon Feb 14 22:49:14 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 14 Feb 2022 22:49:14 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v5] In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 19:48:46 GMT, Harold Seigel wrote: >> Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. >> >> This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > rename macro, fix semi-colon issue, fix zero lseek64 and ftruncate64 build issue Hi Harold, This is looking better - thanks - but I think the lseek64 situation needs handling differently. Thanks, David src/hotspot/os/linux/os_linux.cpp line 4924: > 4922: } > 4923: > 4924: off64_t call_lseek64(int fd, off64_t offset, int whence) { I think it would be better to just change the `lseek64` calls to `os::lseek` rather than introduce this wrapper function. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7248 From sviswanathan at openjdk.java.net Tue Feb 15 02:14:14 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 15 Feb 2022 02:14:14 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3] In-Reply-To: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> References: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> Message-ID: On Sun, 13 Feb 2022 05:18:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > missing ForceInline Marked as reviewed by sviswanathan (Reviewer). Hotspot changes look good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From dlong at openjdk.java.net Tue Feb 15 04:11:14 2022 From: dlong at openjdk.java.net (Dean Long) Date: Tue, 15 Feb 2022 04:11:14 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 18:19:00 GMT, Aleksey Shipilev wrote: > I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. > > Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. > > The default values for options are different per platform, so tests are x86_64 specific. > > No default value is changed, this only unblocks experiments. > > Additional testing: > - [x] New tests on Linux x86_64 fastdebug > - [x] New tests on Linux x86_64 release Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7388 From ioi.lam at oracle.com Tue Feb 15 05:20:54 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Mon, 14 Feb 2022 21:20:54 -0800 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <0d081302-9dfb-3e48-13c0-8ee151bfb626@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> <5d25e7ceeabd9186dd6fe5e9e6e04d0d11ef26c0.camel@redhat.com> <3a76d11a-6816-5179-5a32-fd87e94ae90a@oracle.com> <0d081302-9dfb-3e48-13c0-8ee151bfb626@oracle.com> Message-ID: <99d70a98-e5f2-459e-1606-f2ee4edcfb6f@oracle.com> On 2/13/2022 11:02 PM, David Holmes wrote: > On 14/02/2022 4:07 pm, Ioi Lam wrote: >> On 2/8/2022 3:32 AM, Severin Gehwolf wrote: >>> On Mon, 2022-02-07 at 22:29 -0800, Ioi Lam wrote: >>>> On 2022/02/07 10:36, Severin Gehwolf wrote: >>>>> On Sun, 2022-02-06 at 20:16 -0800, Ioi Lam wrote: >>>>>> Case (4) is the cause for the bug in JDK-8279484 >>>>>> >>>>>> Kubernetes set the cpu.cfs_quota_us to 0 (no limit) and >>>>>> cpu.shares to 2. >>>>>> This means: >>>>>> >>>>>> - This container is guaranteed a minimum amount of CPU resources >>>>>> - If no other containers are executing, this container can use as >>>>>> ???? much CPU as available on the host >>>>>> - If other containers are executing, the amount of CPU available >>>>>> ???? to this container is (2 / (sum of cpu.shares of all active >>>>>> ???? containers)) >>>>>> >>>>>> >>>>>> The fundamental problem with the current JVM implementation is >>>>>> that it >>>>>> treats "CPU request" as a maximum value, the opposite of what >>>>>> Kubernetes >>>>>> does. Because of this, in case (4), the JVM artificially limits >>>>>> itself >>>>>> to a single CPU. This leads to CPU underutilization. >>>>> I agree with your analysis. Key point is that in such a setup >>>>> Kubernetes sets CPU shares value to 2. Though, it's a very specific >>>>> case. >>>>> >>>>> In contrast to Kubernetes the JVM doesn't have insight into what >>>>> other >>>>> containers are doing (or how they are configured). It would, perhaps, >>>>> be good to know what Kubernetes does for containers when the >>>>> environment (i.e. other containers) changes. Do they get restarted? >>>>> Restarted with different values for cpu shares? >>>> My understanding is that Kubernetes will try to do load balancing and >>>> may migrate the containers. According to this: >>>> >>>> https://stackoverflow.com/questions/64891872/kubernetes-dynamic-configurationn-of-cpu-resource-limit >>>> >>>> >>>> If you change the CPU limits, a currently running container will be >>>> shut >>>> down and restarted (using the new limit), and may be relocated to a >>>> different host if necessary. >>>> >>>> I think this means that a JVM process doesn't need to worry about the >>>> CPU limit changing during its lifetime :-) >>>>> Either way, what are our options to fix this? Does it need fixing? >>>>> >>>>> ?? * Should we no longer take cpu shares as a means to limit CPU into >>>>> ???? account? It would be a significant change to how previous JDKs >>>>> ???? worked. Maybe that wouldn't be such a bad idea :) >>>> I think we should get rid of it. This feature was designed to work >>>> with >>>> Kubernetes, but has no effect in most cases. The only time it takes >>>> effect (when no resource limits are set) it does the opposite of what >>>> the user expects. >>> I tend to agree. We should start with a CSR review of this, though, as >>> it would be a behavioural change as compared to previous versions of >>> the JDK. >> >> Hi Severin, >> >> Sorry for the delay. I've created a CSR. Could you take a look? >> >> https://bugs.openjdk.java.net/browse/JDK-8281571 >> >>> >>>> Also, the current implementation is really tied to specific >>>> behaviors of >>>> Kubernetes + docker (the 1024 and 100 constants). This will cause >>>> problems with other container/orchestration software that use >>>> different >>>> algorithms and constants. >>> There are other container orchestration frameworks, like Mesos, which >>> behave in a similar way (1024 constant is being used). The good news is >>> that mesos seems to have moved to a hard-limit default. See: >>> >>> https://mesos.apache.org/documentation/latest/quota/#deprecated-quota-guarantees >>> >>> >>>>> ?? * How likely is CPU underutilization to happen in practise? >>>>> ???? Considering the container is not the only container on the node, >>>>> ???? then according to your formula, it'll get one CPU or less >>>>> anyway. >>>>> ???? Underutilization would, thus, only happen when it's an idle node >>>>> ???? with no other containers running. That would suggest to do >>>>> nothing >>>>> ???? and let the user override it as they see fit. >>>> I think under utilization happens when the containers have a bursty >>>> usage pattern. If other containers do not fully utilize their CPU >>>> quotas, we should distribute the unused CPUs to the busy containers. >>> Right, but this isn't really something the JVM process should care >>> about. It's really a core feature of the orchestration framework to do >>> that. All we could do is to not limit CPU for those cases. On the other >>> hand there is the risk of resource starvation too. Consider a node with >>> many cores, 50 say, and a very small cpu share setting via container >>> limits. The experience running a JVM application in such a set up would >>> be very mediocre as the JVM thinks it can use 50 cores (100% of the >>> time), yet it would only get this when the rest of the >>> containers/universe is idle. >> >> I think we have a general problem that's not specific to containers. >> If we are running 50 active Java processes on a bare-bone Linux, then >> each of them would be default use? a 50-thread ForkJoinPool. In each >> process is given an equal amount of CPU resources, it would make >> sense for each of them to have a single thread FJP so we can avoid >> all thread context switching. > > The JVM cannot optimise this situation because it has no knowledge of > the system, its load, or the workload characteristics. It also doesn't > know how the scheduler may apportion CPU resources. Sizing heuristics > within the JDK itself are pretty basic. If the user/deployer has > better knowledge of what would constitute an "optimum" configuration > then they have control knobs (system properties, VM flags) they can > use to implement that. > >> Or, maybe the Linux kernel is already good enough? If each process is >> bound to a single physical CPU, context switching between the threads >> of the same process should be pretty lightweight. It would be >> worthwhile writing a test case .... > > Binding a process to a single CPU would be potentially very bad for > some workloads. Neither end-point is likely to be "best" in general. > I found some interesting numbers. I think this means we don't accomplish much by restricting the size of thread pools from a relatively small number (the number of physical CPUs, 3 digit or less) to an even smaller number computed by CgroupSubsystem::active_processor_count(). https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/ [Cost for each context switch is] somewhere between 1.2 and 1.5 microseconds per context switch ... Is 1-2 us a long time? As I have mentioned in the post on launch overheads, a good comparison is memcpy, which takes 3 us for 64 KiB on the same machine. In other words, a context switch is a bit quicker than copying 64 KiB of memory from one location to another. ... Conclusion The numbers reported here paint an interesting picture on the state of Linux multi-threaded performance in 2018. I would say that the limits still exist - running a million threads is probably not going to make sense; however, the limits have definitely shifted since the past, and a lot of folklore from the early 2000s doesn't apply today. On a beefy multi-core machine with lots of RAM we can easily run 10,000 threads in a single process today, in production. So after the proposed change, some users may be surprised, "why do I now have 32 threads sleeping inside my containerized app", but the actual CPU/memory cost would be minimal, with a large potential up side -- the app can run much faster when the rest of the system is quiet. (I ran a small test on Linux x64 and the cost per Java thread is about 90KB). Thanks - Ioi From duke at openjdk.java.net Tue Feb 15 05:41:13 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Tue, 15 Feb 2022 05:41:13 GMT Subject: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3] In-Reply-To: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> References: <9geCUxBmjKm5HoVrV2HTlD5DSFkJX-GdvlZbPPnzIcM=.ed8260f3-eed5-4f18-9e37-c12a304e9b4e@github.com> Message-ID: On Sun, 13 Feb 2022 05:18:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > missing ForceInline Thanks a lot for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From david.holmes at oracle.com Tue Feb 15 05:50:10 2022 From: david.holmes at oracle.com (David Holmes) Date: Tue, 15 Feb 2022 15:50:10 +1000 Subject: [RFC containers] 8281181 JDK's interpretation of CPU Shares causes underutilization In-Reply-To: <99d70a98-e5f2-459e-1606-f2ee4edcfb6f@oracle.com> References: <5636636e-3ef9-0087-f3f4-8ef15d618489@oracle.com> <5dbfb77029a00d67542a9104855b2d98a3d8ce5e.camel@redhat.com> <587acce6-dd30-1f78-caf6-17925c32cae6@oracle.com> <5d25e7ceeabd9186dd6fe5e9e6e04d0d11ef26c0.camel@redhat.com> <3a76d11a-6816-5179-5a32-fd87e94ae90a@oracle.com> <0d081302-9dfb-3e48-13c0-8ee151bfb626@oracle.com> <99d70a98-e5f2-459e-1606-f2ee4edcfb6f@oracle.com> Message-ID: <4a9ee526-dfcd-d02f-0ec9-692a91a76d90@oracle.com> Trimming ... On 15/02/2022 3:20 pm, Ioi Lam wrote: > On 2/13/2022 11:02 PM, David Holmes wrote: >> On 14/02/2022 4:07 pm, Ioi Lam wrote: >>> I think we have a general problem that's not specific to containers. >>> If we are running 50 active Java processes on a bare-bone Linux, then >>> each of them would be default use? a 50-thread ForkJoinPool. In each >>> process is given an equal amount of CPU resources, it would make >>> sense for each of them to have a single thread FJP so we can avoid >>> all thread context switching. >> >> The JVM cannot optimise this situation because it has no knowledge of >> the system, its load, or the workload characteristics. It also doesn't >> know how the scheduler may apportion CPU resources. Sizing heuristics >> within the JDK itself are pretty basic. If the user/deployer has >> better knowledge of what would constitute an "optimum" configuration >> then they have control knobs (system properties, VM flags) they can >> use to implement that. >> >>> Or, maybe the Linux kernel is already good enough? If each process is >>> bound to a single physical CPU, context switching between the threads >>> of the same process should be pretty lightweight. It would be >>> worthwhile writing a test case .... >> >> Binding a process to a single CPU would be potentially very bad for >> some workloads. Neither end-point is likely to be "best" in general. >> > > I found some interesting numbers. I think this means we don't accomplish > much by restricting the size of thread pools from a relatively small > number (the number of physical CPUs, 3 digit or less) to an even smaller > number computed by CgroupSubsystem::active_processor_count(). > > https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/ > > > > [Cost for each context switch is] somewhere between 1.2 and 1.5 > microseconds per context switch ... Is 1-2 us a long time? As I have > mentioned in the post on launch overheads, a good comparison is memcpy, > which takes 3 us for 64 KiB on the same machine. In other words, a > context switch is a bit quicker than copying 64 KiB of memory from one > location to another. > ... > Conclusion > The numbers reported here paint an interesting picture on the state of > Linux multi-threaded performance in 2018. I would say that the limits > still exist - running a million threads is probably not going to make > sense; however, the limits have definitely shifted since the past, and a > lot of folklore from the early 2000s doesn't apply today. On a beefy > multi-core machine with lots of RAM we can easily run 10,000 threads in > a single process today, in production. > I agree that the under-utilization caused by the way shares is currently used is bad. But I don't see how the above really relates to that at all. The above is primarily about the RAM cost of threads - and I agree it's better now than it used to be, so a system can support many more threads than it used to. But the main issue with sizing thread pools etc is about effective servicing of load to either achieve throughput or response time goals. Too many threads, just like have too many of any kind of worker, can be very inefficient when they just get in each others way. Cheers, David ----- > So after the proposed change, some users may be surprised, "why do I now > have 32 threads sleeping inside my containerized app", but the actual > CPU/memory cost would be minimal, with a large potential up side -- the > app can run much faster when the rest of the system is quiet. > > (I ran a small test on Linux x64 and the cost per Java thread is about > 90KB). > > Thanks > - Ioi > From shade at openjdk.java.net Tue Feb 15 06:22:15 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Feb 2022 06:22:15 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 18:19:00 GMT, Aleksey Shipilev wrote: > I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. > > Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. > > The default values for options are different per platform, so tests are x86_64 specific. > > No default value is changed, this only unblocks experiments. > > Additional testing: > - [x] New tests on Linux x86_64 fastdebug > - [x] New tests on Linux x86_64 release Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From shade at openjdk.java.net Tue Feb 15 06:22:17 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Feb 2022 06:22:17 GMT Subject: Integrated: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: <2gHgw8d9J0G5-tMO5mB-JYhTkfgt5goIFj92lzxcOlU=.37834a48-0cbc-4c6f-ab93-be942e7a640a@github.com> On Tue, 8 Feb 2022 18:19:00 GMT, Aleksey Shipilev wrote: > I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. > > Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. > > The default values for options are different per platform, so tests are x86_64 specific. > > No default value is changed, this only unblocks experiments. > > Additional testing: > - [x] New tests on Linux x86_64 fastdebug > - [x] New tests on Linux x86_64 release This pull request has now been integrated. Changeset: b1564624 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/b1564624ce454d0df9b2464424b7b5e449481ee6 Stats: 178 lines in 4 files changed: 176 ins; 0 del; 2 mod 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From kbarrett at openjdk.java.net Tue Feb 15 06:52:08 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 15 Feb 2022 06:52:08 GMT Subject: RFR: 8280916: Simplify HotSpot Style Guide editorial changes In-Reply-To: <-94XsVuSzzJ68iz1GZCqu4BXZOp9OVA6t9R7tvaUIU4=.108377db-1312-45de-aafc-88b980e41778@github.com> References: <-94XsVuSzzJ68iz1GZCqu4BXZOp9OVA6t9R7tvaUIU4=.108377db-1312-45de-aafc-88b980e41778@github.com> Message-ID: On Mon, 31 Jan 2022 22:14:32 GMT, Vladimir Kozlov wrote: >> Please review this change to the HotSpot Style Guide change process. >> >> The current process involves gathering consensus among the HotSpot Group >> Members. That's fine for changes of substance. But it seems overly weighty >> for editorial changes that don't affect the substance of the guide, but only >> it's clarity or accuracy. >> >> The proposed change would permit the normal PR process to be used for such >> changes, but require the requisite reviewers to additionally be HotSpot Group >> Members. >> >> Note that there have already been a couple of changes that effectively >> followed the proposed new process. >> https://bugs.openjdk.java.net/browse/JDK-8274169 >> https://bugs.openjdk.java.net/browse/JDK-8280182 >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will not >> be made before Monday 14-Feb-2022 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review process >> to approve (click on Review Changes > Approve), rather than sending a "vote: >> yes" email reply that would be normal for a CFV. > > Approved. Thanks @vnkozlov and all the other reviewers. ------------- PR: https://git.openjdk.java.net/jdk/pull/7281 From kbarrett at openjdk.java.net Tue Feb 15 06:54:13 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 15 Feb 2022 06:54:13 GMT Subject: Integrated: 8280916: Simplify HotSpot Style Guide editorial changes In-Reply-To: References: Message-ID: On Sun, 30 Jan 2022 00:39:20 GMT, Kim Barrett wrote: > Please review this change to the HotSpot Style Guide change process. > > The current process involves gathering consensus among the HotSpot Group > Members. That's fine for changes of substance. But it seems overly weighty > for editorial changes that don't affect the substance of the guide, but only > it's clarity or accuracy. > > The proposed change would permit the normal PR process to be used for such > changes, but require the requisite reviewers to additionally be HotSpot Group > Members. > > Note that there have already been a couple of changes that effectively > followed the proposed new process. > https://bugs.openjdk.java.net/browse/JDK-8274169 > https://bugs.openjdk.java.net/browse/JDK-8280182 > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will not > be made before Monday 14-Feb-2022 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review process > to approve (click on Review Changes > Approve), rather than sending a "vote: > yes" email reply that would be normal for a CFV. This pull request has now been integrated. Changeset: 11f943d1 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/11f943d148e7bc8d931c382ff019b3e65a87432e Stats: 13 lines in 2 files changed: 9 ins; 0 del; 4 mod 8280916: Simplify HotSpot Style Guide editorial changes Reviewed-by: dcubed, dholmes, stuefe, stefank, kvn, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/7281 From kbarrett at openjdk.java.net Tue Feb 15 08:05:10 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 15 Feb 2022 08:05:10 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v4] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Thu, 10 Feb 2022 17:23:42 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > lower case Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/parallel/psOldGen.cpp line 48: > 46: #else > 47: _expand_lock(Mutex::safepoint, "PSOldGenExpand_lock", true) > 48: #endif I think this is the only relative lock rank outside of mutexLocker. I didn't realize that when I suggested the possibility of a private mutex. I'd prefer the definition and rank calculation be left in mutexLocker. src/hotspot/share/gc/parallel/psOldGen.cpp line 179: > 177: // expand. That's okay, we'll just try expanding again. > 178: bool needs_expand = > 179: pointer_delta(object_space()->end(), object_space()->top()) < word_size; This has lost the comment in the `needs_expand` function about the stability of end() and the associated implications for access ordering. I'd like to keep that information. src/hotspot/share/gc/parallel/psOldGen.cpp line 283: > 281: size_t size = align_down(bytes, virtual_space()->alignment()); > 282: if (size > 0) { > 283: assert_lock_strong(&_expand_lock); [pre-existing] Redundant with the assert at the beginning of the function. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From lucy at openjdk.java.net Tue Feb 15 08:29:16 2022 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 15 Feb 2022 08:29:16 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 12:11:54 GMT, Claes Redestad wrote: >> I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. >> >> Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 >> >> - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. >> >> - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. >> >> - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: > > - Merge branch 'master' into count_positives > - Restore partial vector checks in AVX2 and SSE intrinsic variants > - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral > - Simplify changes to encodeUTF8 > - Fix little-endian error caught by testing > - Reduce jumps in the ascii path > - Remove unused tail_mask > - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) > - Add more comments, simplify tail branching in AVX512 variant > - Resolve issues in the precise implementation > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/d4fb8919...c4bb3612 Hi Claes, I'm working on the s390 implementation. I hoped to have it ready, but tests are failing. I'll post a PR (similar to Martin's) once I believe my work is worth to be looked at. Just for clarification: the return value must be the index of the first negative byte? ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From shade at openjdk.java.net Tue Feb 15 09:50:32 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Feb 2022 09:50:32 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler Message-ID: Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. Additional testing: - [x] Linux x86_64 fastdebug `hotspot:tier1` - [x] Linux x86_32 fastdebug `hotspot:tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/7475/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7475&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281815 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/7475.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7475/head:pull/7475 PR: https://git.openjdk.java.net/jdk/pull/7475 From ayang at openjdk.java.net Tue Feb 15 10:09:55 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 15 Feb 2022 10:09:55 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v5] In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: - rename - revert review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7124/files - new: https://git.openjdk.java.net/jdk/pull/7124/files/d5a2a9ca..ef078740 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7124&range=03-04 Stats: 44 lines in 6 files changed: 25 ins; 10 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/7124.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7124/head:pull/7124 PR: https://git.openjdk.java.net/jdk/pull/7124 From ayang at openjdk.java.net Tue Feb 15 10:09:55 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 15 Feb 2022 10:09:55 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v4] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Thu, 10 Feb 2022 17:23:42 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > lower case I have moved the lock back to `mutexLocker` and used the name `PSOldGenExpand_lock`. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From kbarrett at openjdk.java.net Tue Feb 15 10:37:06 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 15 Feb 2022 10:37:06 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v5] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 15 Feb 2022 10:09:55 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - rename > - revert review Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7124 From redestad at openjdk.java.net Tue Feb 15 10:41:10 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 15 Feb 2022 10:41:10 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 08:25:29 GMT, Lutz Schmidt wrote: > Hi Claes, I'm working on the s390 implementation. Awesome, thanks! > > Just for clarification: the return value must be the index of the first negative byte? Yes, or the length if there are no such bytes. I've considered (and am still considering) writing the spec of `countPositives` to allow intrinsics to do an early return of a value that is less than the index if it's prohibitively expensive or complicated to implement the intrinsic to be precise in the case where it finds a negative byte. While it must be precise w.r.t. returning the full length if it's all positive bytes, no call site would break if the intrinsic returned 0 or some convenient number less than the first negative index (my first experiments with the x86 intrinsic did it like this, but since the semantics of the intrinsic would then differ from the java code I was asked to try and make it precise). The aarch64 algorithm is proving to be a challenge to work with and I might ask again for some leeway in a first implementation there. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From sjohanss at openjdk.java.net Tue Feb 15 11:02:09 2022 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 15 Feb 2022 11:02:09 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v5] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 15 Feb 2022 10:09:55 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - rename > - revert review Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From lucy at openjdk.java.net Tue Feb 15 11:24:18 2022 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 15 Feb 2022 11:24:18 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: On Fri, 11 Feb 2022 12:11:54 GMT, Claes Redestad wrote: >> I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. >> >> Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 >> >> - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. >> >> - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. >> >> - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). > > Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 23 additional commits since the last revision: > > - Merge branch 'master' into count_positives > - Restore partial vector checks in AVX2 and SSE intrinsic variants > - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral > - Simplify changes to encodeUTF8 > - Fix little-endian error caught by testing > - Reduce jumps in the ascii path > - Remove unused tail_mask > - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) > - Add more comments, simplify tail branching in AVX512 variant > - Resolve issues in the precise implementation > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/cee17570...c4bb3612 Well, with the existing implementations for ppc and s390, I do not see a complexity advantage with a relaxed spec. The code would have to be there anyway. When it comes to cost, the worst case would be an array of length n, a loop unroll factor of (u==n) and the first (and only) negative byte at index (n-1). All bytes would then be checked twice. With growing n, the overhead diminishes. After all, you want profile-based stub generation - with actual load matching the profile, of course. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From ayang at openjdk.java.net Tue Feb 15 12:27:12 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 15 Feb 2022 12:27:12 GMT Subject: RFR: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock [v5] In-Reply-To: References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 15 Feb 2022 10:09:55 GMT, Albert Mingkun Yang wrote: >> This PR consists of two commits: >> >> 1. remove `ExpandHeap_lock` in Serial GC code. >> 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - rename > - revert review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From ayang at openjdk.java.net Tue Feb 15 12:27:12 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 15 Feb 2022 12:27:12 GMT Subject: Integrated: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock In-Reply-To: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> References: <6zRTvGcJCD7VNEf1_U5RkTE9lg6I3mFFQYKtAb3WRqo=.e5df3ea9-693d-42ba-a7e7-7724f9fc3ad1@github.com> Message-ID: On Tue, 18 Jan 2022 12:03:46 GMT, Albert Mingkun Yang wrote: > This PR consists of two commits: > > 1. remove `ExpandHeap_lock` in Serial GC code. > 2. rename it to `ParallelExpandHeap_lock` to indicate it's Parallel-GC only. > > Test: tier1-6 This pull request has now been integrated. Changeset: bc614840 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/bc6148407e629bd99fa5a8577ebd90320610f349 Stats: 24 lines in 7 files changed: 8 ins; 3 del; 13 mod 8280136: Serial: Remove unnecessary use of ExpandHeap_lock Reviewed-by: iwalulya, kbarrett, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/7124 From redestad at openjdk.java.net Tue Feb 15 13:45:07 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 15 Feb 2022 13:45:07 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v2] In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 11:20:55 GMT, Lutz Schmidt wrote: > Well, with the existing implementations for ppc and s390, I do not see a complexity advantage with a relaxed spec. The code would have to be there anyway. Same for x86, but we could avoid going into and checking the tail on a negative byte in a vector and instead an early return that returns `N * vector_size` where `N` is the number of vectors we've checked that were all positive. This could save a few ns in some cases. > > When it comes to cost, the worst case would be an array of length n, a loop unroll factor of (u==n) and the first (and only) negative byte at index (n-1). All bytes would then be checked twice. With growing n, the overhead diminishes. After all, you want profile-based stub generation - with actual load matching the profile, of course. Sounds about right. I've explored the cost of this in a few microbenchmarks. In `StringDecode/-Encode` such double-checking would happen anyhow later on in the java code. So for most of the prominent use such double-checking is performance neutral even in the worst case. There are a few places where we don't productively use the count and continue to lean on a `hasNegatives` predicate which calls into `countPositives`. This will mean a small amount of useless computation on certain inputs. For the 16- and 32-byte vectors I've benchmarked extensively on x86 (AVX2) the worst case overhead landed in the vicinity of 20 cycles (7.5-15ns @ 2.4Ghz). Allowing for imprecision _could_ improve a few such corner cases, but I've not found a performance sensitive place where it would really matter. ------------- PR: https://git.openjdk.java.net/jdk/pull/7231 From stuefe at openjdk.java.net Tue Feb 15 16:00:07 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 15 Feb 2022 16:00:07 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 08:12:02 GMT, Thomas Stuefe wrote: > NMT backend can be further simplified and cleaned out. > > - some entry points require NMT_TrackingLevel as arguments, some use the global tracking level. Ultimately, every part of NMT always uses the global tracking level, so in many cases the explicit parameter can be removed and the global tracking level can be used instead. > - `MemTracker::malloc_header_size(level)` + `MemTracker::malloc_footer_size(level)` are fused into `MemTracker::overhead_per_malloc()` > - when adding to `MallocSiteTable`, caller gets back a shortcut to the entry. That shortcut is stored verbatim in the malloc header. It consists of two 16-bit values (bucket index and chain position). That tupel finds its way into many argument lists. It can be simplified into single 32-bit opaque marker. Code outside the MallocSiteTable does not need to know what it is. > - Currently, the `MallocHeader` class contains a lot of logic. It accounts (in constructor) and de-accounts (in `MallocHeader::release()`). It would simplify code if `MallocHeader` were just a dumb data carrier and the `MallocTracker` would do the actual work. > - `MallocHeader` can be simplified, almost all members made constant and modifying accessors removed. > - In some places we handle inputptr=NULL gracefully where we should assert instead > - Expressions like `MemTracker::tracking_level() != NMT_off` can be simplified to `MemTracker::enabled()`. > - MemTracker::malloc_base (all variants) can be removed. Note that we have MallocTracker::malloc_header, which achieves the same and does not require casting to the header. > > Testing: > > - GHAs > - manually ran NMT gtests (all NMT modes) and NMT jtreg tests on Ubuntu x64 > - SAP nightlies ran through. Note that since 8275301 "Unify C-heap buffer overrun checks into NMT" NMT is enabled by default in debug builds, so it gets a lot more workout in tests now. > > Note that I wanted to manually verify that the gdb "call pp" command still works in order to not break Zhengyu's recent addition, but found its already broken. I filed https://bugs.openjdk.java.net/browse/JDK-8281023 and am preparing a separate patch. Tested at SAP for 14 days, no problems. Any opinions? Should I reduce this patch, or split it into parts to make it more palatable? ------------- PR: https://git.openjdk.java.net/jdk/pull/7283 From shade at openjdk.java.net Tue Feb 15 16:45:08 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Feb 2022 16:45:08 GMT Subject: RFR: 8281744: x86: Use short jumps in TIG::set_vtos_entry_points In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 15:47:41 GMT, Aleksey Shipilev wrote: > Performance in `-Xint` mode seems to be bottlenecked on the code size, rather than particular instruction hotspots, which means code density is important. > > There are forward branches in `TemplateInterpreterGenerator::set_vtos_entry_points`, which cannot be shortened by `MacroAssembler`, unless we tell it specifically that the upcoming branch target would be within the 8-bit offset. Which it apparently is in this particular case, because there are just a handful of `push`-es between the jump and its target. If a jump offset is more than 8 bits, the interpreter would catch fire just about everywhere, since `set_vtos_entry_points` is used at every bytecode entry. `fastdebug` builds assert the offset sanity directly. > > Current patch improves `SPECjvm2008` performance in `-Xint` mode on Ryzen 7 5700G: > > > compiler.compiler: +4.1% > compiler.sunflow: +4.7% > compress: +9.9% > crypto.signverify: +5.2% > scimark.fft.large: +9.5% > scimark.fft.small: +10.1% > serial: +7.3% > xml.transform: +7.1% > xml.validation: +3.3% > > > There are other places in template interpreter where forward jumps can be short, I'll do them separately, since they are riskier and also less important. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier1` Thanks for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/7463 From shade at openjdk.java.net Tue Feb 15 16:45:08 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 15 Feb 2022 16:45:08 GMT Subject: Integrated: 8281744: x86: Use short jumps in TIG::set_vtos_entry_points In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 15:47:41 GMT, Aleksey Shipilev wrote: > Performance in `-Xint` mode seems to be bottlenecked on the code size, rather than particular instruction hotspots, which means code density is important. > > There are forward branches in `TemplateInterpreterGenerator::set_vtos_entry_points`, which cannot be shortened by `MacroAssembler`, unless we tell it specifically that the upcoming branch target would be within the 8-bit offset. Which it apparently is in this particular case, because there are just a handful of `push`-es between the jump and its target. If a jump offset is more than 8 bits, the interpreter would catch fire just about everywhere, since `set_vtos_entry_points` is used at every bytecode entry. `fastdebug` builds assert the offset sanity directly. > > Current patch improves `SPECjvm2008` performance in `-Xint` mode on Ryzen 7 5700G: > > > compiler.compiler: +4.1% > compiler.sunflow: +4.7% > compress: +9.9% > crypto.signverify: +5.2% > scimark.fft.large: +9.5% > scimark.fft.small: +10.1% > serial: +7.3% > xml.transform: +7.1% > xml.validation: +3.3% > > > There are other places in template interpreter where forward jumps can be short, I'll do them separately, since they are riskier and also less important. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1` > - [x] Linux x86_32 fastdebug, `tier1` This pull request has now been integrated. Changeset: 18704653 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/18704653dcc76b6360b746a6a9c20d614633da0e Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod 8281744: x86: Use short jumps in TIG::set_vtos_entry_points Reviewed-by: rehn, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/7463 From hseigel at openjdk.java.net Tue Feb 15 18:22:00 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 15 Feb 2022 18:22:00 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v6] In-Reply-To: References: Message-ID: > Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. > > This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: change lseek64() calls to os::lseek() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7248/files - new: https://git.openjdk.java.net/jdk/pull/7248/files/d062fb50..e0abfdb4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7248&range=04-05 Stats: 9 lines in 1 file changed: 0 ins; 4 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/7248.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7248/head:pull/7248 PR: https://git.openjdk.java.net/jdk/pull/7248 From hseigel at openjdk.java.net Tue Feb 15 18:29:14 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Tue, 15 Feb 2022 18:29:14 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v5] In-Reply-To: References: Message-ID: <2YciZFs9m7FUZHg0s3Eun8s-XFoeG2lj9-UEva1uXTw=.3a5d6d11-13ea-4c56-87aa-563c8f23ed6b@github.com> On Mon, 14 Feb 2022 22:37:39 GMT, David Holmes wrote: >> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: >> >> rename macro, fix semi-colon issue, fix zero lseek64 and ftruncate64 build issue > > src/hotspot/os/linux/os_linux.cpp line 4924: > >> 4922: } >> 4923: >> 4924: off64_t call_lseek64(int fd, off64_t offset, int whence) { > > I think it would be better to just change the `lseek64` calls to `os::lseek` rather than introduce this wrapper function. Thanks David. I changed the code to call os::lseek(). ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From duke at openjdk.java.net Tue Feb 15 19:01:08 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Tue, 15 Feb 2022 19:01:08 GMT Subject: Integrated: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts In-Reply-To: References: Message-ID: On Sat, 5 Feb 2022 15:34:08 GMT, Quan Anh Mai wrote: > Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used in vector lane-wise reinterpreting operations. > > Thank you very much. This pull request has now been integrated. Changeset: 0af356bb Author: Quan Anh Mai Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/0af356bb4bfee99223d4bd4f8b0001c5f362c150 Stats: 490 lines in 19 files changed: 428 ins; 24 del; 38 mod 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts Reviewed-by: psandoz, sviswanathan ------------- PR: https://git.openjdk.java.net/jdk/pull/7358 From dholmes at openjdk.java.net Tue Feb 15 22:12:17 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 15 Feb 2022 22:12:17 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v6] In-Reply-To: References: Message-ID: <6rDE2Bo8OEdvlJ5uIQsJL3flAH-AgCpkXk03D5hd4dQ=.8545e589-cc2e-4f23-b439-64b7a18d98ab@github.com> On Tue, 15 Feb 2022 18:22:00 GMT, Harold Seigel wrote: >> Please review this new attempt to resolve JDK-8214976. This fix adds Pragmas to generate compilation errors, when using gcc, if calling a native system function instead of the os:: version of the function. The fix includes changes to calls in non-shared code because it is cleaner than adding PRAGMAs and, for some cases, the os:: version of a function has added value, such as asserts and RESTARTABLE. This fix slightly changes the signature of os::abort() so it wouldn't conflict with native abort() functions. Changes to Windows code is left for a future RFE. >> >> This fix was tested with Mach5 tiers 1-2 on Linux, Mac OS, and Windows, Mach5 tiers 3-5 on Linux x64, and Mach5 builds of Zero, PPC, and s390. >> >> Thanks, Harold > > Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: > > change lseek64() calls to os::lseek() Thanks Harold, This seems acceptable to me now. Only remaining issue is the placement issue Kim raised - see query/suggestion below. David ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From dholmes at openjdk.java.net Tue Feb 15 22:12:18 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 15 Feb 2022 22:12:18 GMT Subject: RFR: 8214976: Warn about uses of functions replaced for portability [v6] In-Reply-To: References: Message-ID: On Fri, 28 Jan 2022 22:40:45 GMT, Kim Barrett wrote: >> Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: >> >> change lseek64() calls to os::lseek() > > src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 109: > >> 107: FORBID_C_FUNCTION(ssize_t write(int, const void*, size_t ), "use os::write"); >> 108: >> 109: FORBID_C_FUNCTION(char* strtok(char*, const char*), "use strtok_r"); > > Some of these functions are portable and ought to be forbidden in a platform agnostic location, so the restriction also applies if/when we have real support on other platforms. I think almost none are gcc (or clang) specific, but are instead probably posix and not windows, so maybe should go in a different place as well. Basically I think the structure / placement considerations need some more work. Can we put the list of forbidden functions in os.hpp? ------------- PR: https://git.openjdk.java.net/jdk/pull/7248 From duke at openjdk.java.net Wed Feb 16 07:59:08 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Wed, 16 Feb 2022 07:59:08 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Thu, 10 Feb 2022 19:05:41 GMT, Markus Gr?nlund wrote: >> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. >> >> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below >> by using JfrJavaSupport::abort(). >> >> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... >> >> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). >> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core >> because there is no space on device. >> Could you please review the fix? > > src/hotspot/share/jfr/jni/jfrJavaSupport.hpp line 103: > >> 101: >> 102: // critical >> 103: static void abort(jstring errorMsg, TRAPS, bool dump_core=true); > > Not sure this is necessary. The existing core dump logic already handles the case where a core file cannot be generated due to disk full. Thank you for your review. Whether or not hotspot generate a core file is determined by the argument of vm_abort(bool dump_core). If the argument is "true", vm_abort(bool dump_core) calls os::abort(bool dump_core) to generate a core file. See the following code: https://github.com/openjdk/jdk/blob/3c160ab5bec0c2364ec3f43c5a5789098d4699e5/src/hotspot/share/runtime/java.cpp#L625 I think JfrJavaSupport::abort() should pass "false" as an argument to vm_abort(bool dump_core). > test/hotspot/jtreg/runtime/jfr/TestJFRDiskFull.java line 127: > >> 125: raf.close(); >> 126: } >> 127: } > > I appreciate the effort, but we can't have a test that intentionally provokes a disk full situation. Instead, the updated error message will have to be manually verified. I use `@run main/manual` in TestJFRDiskFull.java. I think this label means manually test. I mannually confirmed this test to pass with jtreg after this fix. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From shade at openjdk.java.net Wed Feb 16 08:04:15 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 16 Feb 2022 08:04:15 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: <_CQho3N-3xjNk1Pm-KRzhR9q0ZEhGnOQYppdFmc0EWg=.4dc9823d-994a-4a93-88a7-777579daff1c@github.com> On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` The GHA failure on x86_32 is new and unrelated: https://bugs.openjdk.java.net/browse/JDK-8281822 ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From mgronlun at openjdk.java.net Wed Feb 16 10:28:12 2022 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 16 Feb 2022 10:28:12 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 16 Feb 2022 07:54:26 GMT, KIRIYAMA Takuya wrote: >> src/hotspot/share/jfr/jni/jfrJavaSupport.hpp line 103: >> >>> 101: >>> 102: // critical >>> 103: static void abort(jstring errorMsg, TRAPS, bool dump_core=true); >> >> Not sure this is necessary. The existing core dump logic already handles the case where a core file cannot be generated due to disk full. > > Thank you for your review. > > Whether or not hotspot generate a core file is determined by the argument of vm_abort(bool dump_core). If the argument is "true", vm_abort(bool dump_core) calls os::abort(bool dump_core) to generate a core file. > See the following code: > https://github.com/openjdk/jdk/blob/3c160ab5bec0c2364ec3f43c5a5789098d4699e5/src/hotspot/share/runtime/java.cpp#L625 > > I think JfrJavaSupport::abort() should pass "false" as an argument to vm_abort(bool dump_core). Ok. My point was that the os won't be able to create a core file if there is no available space. But this is indeed more succinct, if we don't want to create a core categorically from this location. >> test/hotspot/jtreg/runtime/jfr/TestJFRDiskFull.java line 127: >> >>> 125: raf.close(); >>> 126: } >>> 127: } >> >> I appreciate the effort, but we can't have a test that intentionally provokes a disk full situation. Instead, the updated error message will have to be manually verified. > > I use `@run main/manual` in TestJFRDiskFull.java. I think this label means manually test. > I mannually confirmed this test to pass with jtreg after this fix. My apologies, I missed the @run main/manual decoration. I don't think we have any JFR tests that use it. If you can ensure this test is excluded for automatic runs, then perhaps...but then I don't know who will get to run it, so the value of the test is questionable. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From jbhateja at openjdk.java.net Wed Feb 16 11:05:07 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 16 Feb 2022 11:05:07 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v4] In-Reply-To: References: Message-ID: <1dqqh2KXNKFtAUuCMwvI9mLA0jFw--Bqz-AEfrxq_NM=.1b9f677e-3798-4877-9b58-8afdc8ed64ac@github.com> > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Replacing by efficient instruction sequence based on MXCSR.RC mode. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/2dc364fa..1c9ff777 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=02-03 Stats: 143 lines in 4 files changed: 4 ins; 82 del; 57 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Wed Feb 16 12:30:27 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 16 Feb 2022 12:30:27 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v5] In-Reply-To: References: Message-ID: <-NfiIwcnrf7TRNxA9x1d9itPvKYgeCYogpjSZgGYtvc=.15346702-2db7-4295-8e5a-a4864f3bbdbd@github.com> > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - 8279508: Adding few descriptive comments. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 - 8279508: Replacing by efficient instruction sequence based on MXCSR.RC mode. - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 - 8279508: Adding a test for scalar intrinsification. - 8279508: Auto-vectorize Math.round API ------------- Changes: https://git.openjdk.java.net/jdk/pull/7094/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=04 Stats: 739 lines in 23 files changed: 648 ins; 29 del; 62 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Wed Feb 16 12:30:28 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 16 Feb 2022 12:30:28 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 17:14:10 GMT, Jatin Bhateja wrote: >> That pseudocode would make a very useful comment too. This whole patch is very thinly commented. > >> > Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. >> >> Hi @merykitty , You are correct, we can embed RC mode in instruction encoding of round instruction (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects. > > **Just want to correct above statement, LDMXCSR will not be re-ordered/re-scheduled early OOO backend.** > That pseudocode would make a very useful comment too. This whole patch is very thinly commented. I have replaced earlier bulky sequence, new sequence is having similar performance but reduction in code may improve inlining behavior. Added descriptive comments around the special cases. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Wed Feb 16 12:40:10 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 16 Feb 2022 12:40:10 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v3] In-Reply-To: References: Message-ID: On Wed, 16 Feb 2022 12:26:45 GMT, Jatin Bhateja wrote: >>> > Hi, IIRC for evex encoding you can embed the RC control bit directly in the evex prefix, removing the need to rely on global MXCSR register. Thanks. >>> >>> Hi @merykitty , You are correct, we can embed RC mode in instruction encoding of round instruction (towards -inf,+inf, zero). But to match the semantics of Math.round API one needs to add 0.5[f] to input value and then perform rounding over resultant value, which is why @sviswa7 suggested to use a global rounding mode driven by MXCSR.RC so that intermediate floating inexact values are resolved as desired, but OOO execution may misplace LDMXCSR and hence may have undesired side effects. >> >> **Just want to correct above statement, LDMXCSR will not be re-ordered/re-scheduled early OOO backend.** > >> That pseudocode would make a very useful comment too. This whole patch is very thinly commented. > > I have replaced earlier bulky sequence, new sequence is having similar performance but reduction in code may improve inlining behavior. Added descriptive comments around the special cases. > There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes defined. > > Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, `RoundDoubleMode` supports multiple rounding modes and is amenable to auto-vectorization. > > What do you think about the following alternative? > > Reuse `RoundDoubleMode` (with a new rounding mode) and introduce `RoundFloatMode`. > > Special rounding rules is not the only peculiarity of `Math.round()`. It also converts the result to an integral type. It can be represented as `ConvF2I (RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar case, it can be matched as a single AD instruction. > > Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf #rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a similar manner. Adding new rounding mode to RoundDoubleMode may disturb other targets. match_rule_supported routine operates over Opcodes and currently any target supporting RoundDoubleMode generates code for all the rounding modes. Your solution is anyways based on creating new scalar and vector IR node for floating point rounding operation, which is what patch is doing currently. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From dholmes at openjdk.java.net Wed Feb 16 12:45:05 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 16 Feb 2022 12:45:05 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 06:41:41 GMT, KIRIYAMA Takuya wrote: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? test/hotspot/jtreg/runtime/jfr/TestJFRDiskFull.java line 28: > 26: * @test > 27: * @bug 8280684 > 28: * @summary JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. typo: failes ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From dholmes at openjdk.java.net Wed Feb 16 12:50:09 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 16 Feb 2022 12:50:09 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: <5bed6fhWOY90Tpy4sFkixqBN96vvbc9qrN1xOLRzeqI=.1f707da6-6c64-4b8f-9e75-4121199156de@github.com> On Wed, 16 Feb 2022 10:17:00 GMT, Markus Gr?nlund wrote: >> Thank you for your review. >> >> Whether or not hotspot generate a core file is determined by the argument of vm_abort(bool dump_core). If the argument is "true", vm_abort(bool dump_core) calls os::abort(bool dump_core) to generate a core file. >> See the following code: >> https://github.com/openjdk/jdk/blob/3c160ab5bec0c2364ec3f43c5a5789098d4699e5/src/hotspot/share/runtime/java.cpp#L625 >> >> I think JfrJavaSupport::abort() should pass "false" as an argument to vm_abort(bool dump_core). > > Ok. My point was that the os won't be able to create a core file if there is no available space. > > But this is indeed more succinct, if we don't want to create a core categorically from this location. Just an observation but the filesystem that is full, and the filesystem where a core would be written, need not be the same file system. That said, a core dump in this case seems unwarranted. >> I use `@run main/manual` in TestJFRDiskFull.java. I think this label means manually test. >> I mannually confirmed this test to pass with jtreg after this fix. > > My apologies, I missed the @run main/manual decoration. I don't think we have any JFR tests that use it. > > If you can ensure this test is excluded for automatic runs, then perhaps...but then I don't know who will get to run it, so the value of the test is questionable. Manual tests are excluded if the jtreg test run specifies to run automatic tests only (as we do in our CI). So this really only serves as a validation of the fix, with no real expectation that anyone will necessarily every run it again. Even as a locally run test, filling the disk can easily lead to unexpected problems for other processes - including the swap/paging file on Windows - so this is a risky test to run. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From dholmes at openjdk.java.net Wed Feb 16 12:55:05 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 16 Feb 2022 12:55:05 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 16 Feb 2022 12:41:52 GMT, David Holmes wrote: >> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. >> >> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below >> by using JfrJavaSupport::abort(). >> >> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... >> >> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). >> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core >> because there is no space on device. >> Could you please review the fix? > > test/hotspot/jtreg/runtime/jfr/TestJFRDiskFull.java line 28: > >> 26: * @test >> 27: * @bug 8280684 >> 28: * @summary JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. > > typo: failes Actually "summary" is meant to describe what the test does, not what the original bug was about ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From mgronlun at openjdk.java.net Wed Feb 16 13:08:11 2022 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 16 Feb 2022 13:08:11 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 06:41:41 GMT, KIRIYAMA Takuya wrote: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? Takuya, can I suggest keeping your proposed changes but excluding the test? ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From ayang at openjdk.java.net Wed Feb 16 15:16:22 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 16 Feb 2022 15:16:22 GMT Subject: RFR: 8281971: Remove unimplemented InstanceRefKlass::do_next Message-ID: Trivial change of removing dead code. Test: build ------------- Commit messages: - trivial Changes: https://git.openjdk.java.net/jdk/pull/7497/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7497&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281971 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7497.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7497/head:pull/7497 PR: https://git.openjdk.java.net/jdk/pull/7497 From lkorinth at openjdk.java.net Wed Feb 16 17:05:13 2022 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 16 Feb 2022 17:05:13 GMT Subject: RFR: 8269537: memset() is called after operator new [v4] In-Reply-To: References: Message-ID: On Wed, 20 Oct 2021 09:36:38 GMT, Leo Korinth wrote: >> The basic problem is that we are relying on undefined behaviour, as documented in the code: >> >> // This whole business of passing information from ResourceObj::operator new >> // to the ResourceObj constructor via fields in the "object" is technically UB. >> // But it seems to work within the limitations of HotSpot usage (such as no >> // multiple inheritance) with the compilers and compiler options we're using. >> // And it gives some possibly useful checking for misuse of ResourceObj. >> >> >> I am removing the undefined behaviour by passing the type of allocation through a thread local variable. >> >> This solution has some advantages: >> 1) it is not UB >> 2) it is simpler and easier to understand >> 3) it uses less memory (I could make it use even less if I made the enum `allocation_type` a u8) >> 4) in the *very* unlikely situation that stack memory (or embedded) already equals the data calculated from the address of the object, the code will also work. >> >> When doing the change, I also updated `allocated_on_stack()` to the new name `allocated_on_stack_or_embedded()` which is much harder to misinterpret. >> >> I also disallow to "fake" the memory type by explicitly calling `ResourceObj::set_allocation_type`. >> >> This forced me to change two places that is faking the allocation type of an embedded `GrowableArray` from `STACK_OR_EMBEDDED` to `C_HEAP`. The faking of the type is hard to understand as a `STACK_OR_EMBEDDED` `GrowableArray` can allocate any type of object. My guess is that `GrowableArray` has changed behaviour, or maybe that it was hard to understand because the old naming of `allocated_on_stack()`. >> >> I have also tried to update the comments. In doing that I not only changed the comments for this change, but also for the *incorrect* advice to always delete object you allocate with new. >> >> Testing on debug build tier1-3 >> Testing on release build tier1 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > review updates This comment will keep this pull request alive a bit longer. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From iklam at openjdk.java.net Wed Feb 16 17:49:11 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 16 Feb 2022 17:49:11 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v3] In-Reply-To: References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> <7c6mh2-s3SkpfGG1WptyZsJjTfcDy1wX0Ll0713MLkU=.7df74a01-7ea5-49c1-9bda-f73798df3852@github.com> Message-ID: On Wed, 19 Jan 2022 05:50:50 GMT, Ioi Lam wrote: >> I don't really know this code well enough to do a good code review. I had some comments though. > >> I don't really know this code well enough to do a good code review. I had some comments though. > > Hi Coleen, thanks for taking a look. > > This PR has two major parts: > > 1. Check for inappropriate reference to static fields. This is mainly done in cdsHeapVerifier.cpp. These checks don't affect the contents of the CDS archive. They just print out warnings if problems are found. > 2. Special initialization of enum classes. Essentially if any instance of an enum class `X` is archived, then `X::` will not be executed, and we'll take this path instead (in instanceKlass.cpp): > > > // This is needed to ensure the consistency of the archived heap objects. > if (has_archived_enum_objs()) { > assert(is_shared(), "must be"); > bool initialized = HeapShared::initialize_enum_klass(this, CHECK); > if (initialized) { > return; > } > } > > Could you check if (2) is correct? > @iklam This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! keepalive ------------- PR: https://git.openjdk.java.net/jdk/pull/6653 From dholmes at openjdk.java.net Wed Feb 16 21:32:02 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 16 Feb 2022 21:32:02 GMT Subject: RFR: 8281971: Remove unimplemented InstanceRefKlass::do_next In-Reply-To: References: Message-ID: On Wed, 16 Feb 2022 15:10:26 GMT, Albert Mingkun Yang wrote: > Trivial change of removing dead code. > > Test: build Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7497 From joe.darcy at oracle.com Wed Feb 16 22:20:20 2022 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Wed, 16 Feb 2022 14:20:20 -0800 Subject: RFR: 8279508: Auto-vectorize Math.round API [v2] In-Reply-To: References: <2TVKx_BFFyAK2ooOWKpdsEIMFzJngYxlWjbgeZ2y4Mc=.5deb2173-8107-476d-92ca-1835d69ce336@github.com> Message-ID: <6e3a21d8-fc16-24b3-ead1-fefb52db9684@oracle.com> On 2/12/2022 6:55 PM, Jatin Bhateja wrote: > On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: > >> The JVM currently initializes the x86 mxcsr to round to nearest even, see below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for Math.rint which is specified to be round to nearest even. Please see: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html : section 4.8.4 >> >> The rounding mode needed for Math.round is round to positive infinity which needs a different x86 mxcsr initialization(0x5F80). > Hi @sviswa7 , > As per JLS 17 section 15.4 Java follows round to nearest rounding policy for all floating point operations except conversion to integer and remainder where it uses round toward zero. That is a true background condition, but I will note that the Math.round method does independently define the semantics of its operation and rounding behavior, which has changed (slightly) over the lifetime of the platform. -Joe From jiefu at openjdk.java.net Wed Feb 16 23:27:12 2022 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 16 Feb 2022 23:27:12 GMT Subject: RFR: 8281467: Allow larger OptoLoopAlignment and CodeEntryAlignment In-Reply-To: References: Message-ID: <9JilFxfj4hSQbkBrNAgpkhjPiGyYFGn0dCR_9SypWKg=.32cee859-7533-4109-871b-44f20d1a89e3@github.com> On Tue, 15 Feb 2022 06:17:57 GMT, Aleksey Shipilev wrote: >> I am following up on the performance issue where the culprit seems to be the too low `OptoLoopAlignment`. To perform better experiments, I suggest allowing larger alignments. >> >> Note that we cannot make `OptoLoopAlignment` larger than `CodeEntryAlignment`, because nmethod copy would break it, see assert in `MacroAssembler::align`. See [JDK-8273459](https://bugs.openjdk.java.net/browse/JDK-8273459) for latest discussion about it. So `CodeEntryAlignment` needs to be configurable as well. >> >> The default values for options are different per platform, so tests are x86_64 specific. >> >> No default value is changed, this only unblocks experiments. >> >> Additional testing: >> - [x] New tests on Linux x86_64 fastdebug >> - [x] New tests on Linux x86_64 release > > Thank you! Hi @shipilev , compiler/arguments/TestCodeEntryAlignment.java fails on AVX512 machines. Please have a look: https://github.com/openjdk/jdk/pull/7485 Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7388 From jbhateja at openjdk.java.net Thu Feb 17 03:44:02 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 17 Feb 2022 03:44:02 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v5] In-Reply-To: <-NfiIwcnrf7TRNxA9x1d9itPvKYgeCYogpjSZgGYtvc=.15346702-2db7-4295-8e5a-a4864f3bbdbd@github.com> References: <-NfiIwcnrf7TRNxA9x1d9itPvKYgeCYogpjSZgGYtvc=.15346702-2db7-4295-8e5a-a4864f3bbdbd@github.com> Message-ID: On Wed, 16 Feb 2022 12:30:27 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 584.99 | 1870.70 | 3.20 | 510.35 | 548.60 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 257.17 | 965.33 | 3.75 | 293.60 | 273.15 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.69 | 3592.54 | 4.35 | 825.32 | 1836.42 | 2.23 >> FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | 412.31 | 945.82 | 2.29 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - 8279508: Adding few descriptive comments. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Replacing by efficient instruction sequence based on MXCSR.RC mode. > - 8279508: Adding vectorized algorithms to match the semantics of rounding operations. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8279508 > - 8279508: Adding a test for scalar intrinsification. > - 8279508: Auto-vectorize Math.round API > _Mailing list message from [Joseph D. Darcy](mailto:joe.darcy at oracle.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > On 2/12/2022 6:55 PM, Jatin Bhateja wrote: > > > On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: > > > The JVM currently initializes the x86 mxcsr to round to nearest even, see below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works for Math.rint which is specified to be round to nearest even. Please see: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html : section 4.8.4 > > > The rounding mode needed for Math.round is round to positive infinity which needs a different x86 mxcsr initialization(0x5F80). > > > Hi @sviswa7 , > > > As per JLS 17 section 15.4 Java follows round to nearest rounding policy for all floating point operations except conversion to integer and remainder where it uses round toward zero. > > That is a true background condition, but I will note that the Math.round method does independently define the semantics of its operation and rounding behavior, which has changed (slightly) over the lifetime of the platform. > > -Joe Hi @jddarcy , Thanks for your comments, patch has been updated to follow the prescribed semantics of Math.round API. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Thu Feb 17 08:00:28 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Thu, 17 Feb 2022 08:00:28 GMT Subject: RFR: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ Message-ID: `ZSaveLiveRegisters::ZSaveLiveRegisters` stores live registers, and later they are loaded again. This includes opmask registers, which are part of AVX512. However, not all platforms have all of the AVX512 instructions. For example Knights Landing has general AVX512 support and makes use of optmask registers, but does not support the AVX512 BW subset of instructions, specifically it does not support the `kmovql` instruction. Platforms like Cannon Landing have support for AVX512 BW. Solution: in analogy to `RegisterSaver::save_live_registers`, which seems to perform a very similar task, use `MacroAssembler::kmov` instead of `kmovql` directly. Internally, `kmov` choses either `kmovql` if avx512bw is available, else it takes `kmovwl`. As a regression test, I took one of the tests that failed with `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC`, and added an additional `@run` statement with those flags. I simulated this test locally with `sde -knl` (Knights Landing, AVX512 but not BW, fails without change to `kmov`, passes with it) and `sde -cnl` (Cannon Landing, has AVX512 BW, passes before and after code change). Ran additional tests to verify that the test triggers before code change, and that with the code change nothing broke. @neliasso Thanks for the help! ------------- Commit messages: - 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ Changes: https://git.openjdk.java.net/jdk/pull/7510/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7510&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281544 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7510.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7510/head:pull/7510 PR: https://git.openjdk.java.net/jdk/pull/7510 From duke at openjdk.java.net Thu Feb 17 08:03:36 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Thu, 17 Feb 2022 08:03:36 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v2] In-Reply-To: References: Message-ID: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7227/files - new: https://git.openjdk.java.net/jdk/pull/7227/files/3c160ab5..c2ad1c39 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7227&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7227&range=00-01 Stats: 127 lines in 1 file changed: 0 ins; 127 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7227.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7227/head:pull/7227 PR: https://git.openjdk.java.net/jdk/pull/7227 From duke at openjdk.java.net Thu Feb 17 08:03:38 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Thu, 17 Feb 2022 08:03:38 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: <-YSoVkfzQH5LQTzJgUqPKIEuicdyUCrugxQfSb6WVL0=.61724f11-e25f-4f49-a6c4-6a76f4f539e4@github.com> On Wed, 16 Feb 2022 13:04:36 GMT, Markus Gr?nlund wrote: > Takuya, can I suggest keeping your proposed changes but excluding the test? OK. This test is surely risky. I remove this test. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From rrich at openjdk.java.net Thu Feb 17 08:33:05 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 17 Feb 2022 08:33:05 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: <-5PaPJvfgixjpHechVa5bra26EWo4n-hvDnYlM4jXO8=.f709cab3-c666-42f2-bfda-c4caa2709755@github.com> On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` On s390, being CISC too, there are similar issues. We addressed them with `NearLabel`, `branch_optimized`, and `compare_and_branch_optimized`. They provide a higher level of abstraction which helps writing better code without knowing all the details, which at least I instantly forget after looking into the manual. ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From shade at openjdk.java.net Thu Feb 17 08:38:08 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 17 Feb 2022 08:38:08 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: <-5PaPJvfgixjpHechVa5bra26EWo4n-hvDnYlM4jXO8=.f709cab3-c666-42f2-bfda-c4caa2709755@github.com> References: <-5PaPJvfgixjpHechVa5bra26EWo4n-hvDnYlM4jXO8=.f709cab3-c666-42f2-bfda-c4caa2709755@github.com> Message-ID: On Thu, 17 Feb 2022 08:30:04 GMT, Richard Reingruber wrote: > On s390, being CISC too, there are similar issues. We addressed them with `NearLabel`, `branch_optimized`, and `compare_and_branch_optimized`. They provide a higher level of abstraction which helps writing better code without knowing all the details, which at least I instantly forget after looking into the manual. In x86 `MacroAssembler` there are `jcc` and `jccb` for this. When `MacroAssembler` can make `jcc`, it would, but that requires the jump target to be already bound, so that jump offset is already known. For *forward* jumps, though, `MacroAssembler` cannot know this, so in those cases we need to tell it explicitly. `NearLabel` looks like another way of doing so. ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From rrich at openjdk.java.net Thu Feb 17 08:47:12 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 17 Feb 2022 08:47:12 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` > > On s390, being CISC too, there are similar issues. We addressed them with `NearLabel`, `branch_optimized`, and `compare_and_branch_optimized`. They provide a higher level of abstraction which helps writing better code without knowing all the details, which at least I instantly forget after looking into the manual. > > In x86 `MacroAssembler` there are `jcc` and `jccb` for this. When `MacroAssembler` can make `jcc`, it would, but that requires the jump target to be already bound, so that jump offset is already known. For _forward_ jumps, though, `MacroAssembler` cannot know this, so in those cases we need to tell it explicitly. `NearLabel` looks like another way of doing so. Yes it is another way of doing so. For me the intend is clearer. Also you can pass a `NearLabel` to an assembler method that takes a `Label` parameter and there you can optimize if the passed `Label` is actually a `NearLabel`. ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From shade at openjdk.java.net Thu Feb 17 08:47:12 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 17 Feb 2022 08:47:12 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: <4DVKrNLAlMWNwHrtup6OZNim_oH5WjYpJ1GClgFU5w0=.557f929e-10d0-48d7-b3e3-2e5cf2d98225@github.com> On Thu, 17 Feb 2022 08:43:33 GMT, Richard Reingruber wrote: > Yes it is another way of doing so. For me the intend is clearer. Also you can pass a `NearLabel` to an assembler method that takes a `Label` parameter and there you can optimize if the passed `Label` is actually a `NearLabel`. True. I would like to consider that out of scope for this PR, would you agree? ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From rrich at openjdk.java.net Thu Feb 17 08:53:16 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 17 Feb 2022 08:53:16 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: <4DVKrNLAlMWNwHrtup6OZNim_oH5WjYpJ1GClgFU5w0=.557f929e-10d0-48d7-b3e3-2e5cf2d98225@github.com> References: <4DVKrNLAlMWNwHrtup6OZNim_oH5WjYpJ1GClgFU5w0=.557f929e-10d0-48d7-b3e3-2e5cf2d98225@github.com> Message-ID: <9UaPruI3qc2BWhAtsCFzwqGybDqDWvlBRZfdvn5-K5U=.c6fe43a9-ad61-4fb2-bf32-d78a4d1a02c3@github.com> On Thu, 17 Feb 2022 08:45:00 GMT, Aleksey Shipilev wrote: > > Yes it is another way of doing so. For me the intend is clearer. Also you can pass a `NearLabel` to an assembler method that takes a `Label` parameter and there you can optimize if the passed `Label` is actually a `NearLabel`. > > True. I would like to consider that out of scope for this PR, would you agree? Of course. ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From rrich at openjdk.java.net Thu Feb 17 09:02:03 2022 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 17 Feb 2022 09:02:03 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` Changes seem fine. Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7475 From mgronlun at openjdk.java.net Thu Feb 17 10:28:04 2022 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 17 Feb 2022 10:28:04 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v2] In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 08:03:36 GMT, KIRIYAMA Takuya wrote: >> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. >> >> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below >> by using JfrJavaSupport::abort(). >> >> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... >> >> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). >> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core >> because there is no space on device. >> Could you please review the fix? > > KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: > > 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. Changes requested by mgronlun (Reviewer). src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 90: > 88: JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false); > 89: } > 90: else { The else block can be removed. Just put the guarantee inline with the other code. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From ayang at openjdk.java.net Thu Feb 17 11:46:03 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 17 Feb 2022 11:46:03 GMT Subject: RFR: 8281971: Remove unimplemented InstanceRefKlass::do_next In-Reply-To: References: Message-ID: On Wed, 16 Feb 2022 15:10:26 GMT, Albert Mingkun Yang wrote: > Trivial change of removing dead code. > > Test: build Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/7497 From ayang at openjdk.java.net Thu Feb 17 11:46:04 2022 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 17 Feb 2022 11:46:04 GMT Subject: Integrated: 8281971: Remove unimplemented InstanceRefKlass::do_next In-Reply-To: References: Message-ID: <2xx8D-V_0yZ8M6ych4lMZbbHEr0Lwywyhz7yJW-2_NM=.af5cddf0-452c-4ac0-99df-f2e79902b22f@github.com> On Wed, 16 Feb 2022 15:10:26 GMT, Albert Mingkun Yang wrote: > Trivial change of removing dead code. > > Test: build This pull request has now been integrated. Changeset: 3b7a3cfc Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/3b7a3cfce345cc900e042c5378d35d1237bdcd78 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod 8281971: Remove unimplemented InstanceRefKlass::do_next Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/7497 From duke at openjdk.java.net Thu Feb 17 12:01:56 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Thu, 17 Feb 2022 12:01:56 GMT Subject: RFR: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ [v2] In-Reply-To: References: Message-ID: > `ZSaveLiveRegisters::ZSaveLiveRegisters` stores live registers, and later they are loaded again. > This includes opmask registers, which are part of AVX512. However, not all platforms have all of the AVX512 instructions. > For example Knights Landing has general AVX512 support and makes use of optmask registers, but does not support the AVX512 BW subset of instructions, specifically it does not support the `kmovql` instruction. Platforms like Cannon Landing have support for AVX512 BW. > > Solution: in analogy to `RegisterSaver::save_live_registers`, which seems to perform a very similar task, use `MacroAssembler::kmov` instead of `kmovql` directly. Internally, `kmov` choses either `kmovql` if avx512bw is available, else it takes `kmovwl`. > > As a regression test, I took one of the tests that failed with `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC`, and added an additional `@run` statement with those flags. I simulated this test locally with Intel Software Development Emulator: > `sde -knl`: Knights Landing, AVX512 but not BW, fails without change to `kmov`, passes with it. > `sde -cnl`: Cannon Landing, has AVX512 BW, passes before and after code change. > > Ran additional tests to verify that the test triggers before code change, and that with the code change nothing broke. > > @neliasso Thanks for the help! Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix indentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7510/files - new: https://git.openjdk.java.net/jdk/pull/7510/files/9e4169fb..7636119d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7510&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7510&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7510.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7510/head:pull/7510 PR: https://git.openjdk.java.net/jdk/pull/7510 From redestad at openjdk.java.net Thu Feb 17 15:28:57 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 17 Feb 2022 15:28:57 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v3] In-Reply-To: References: Message-ID: <7kC4xxWon70YnYlqH_KJFTa2eEJf-P3VQ1L9ahugJgk=.0943bcaa-b53d-4216-afa1-69496dac248a@github.com> > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: - Revert micro changes, split out to #7516 - Merge branch 'master' of https://github.com/cl4es/jdk into count_positives - Merge branch 'master' into count_positives - Restore partial vector checks in AVX2 and SSE intrinsic variants - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral - Simplify changes to encodeUTF8 - Fix little-endian error caught by testing - Reduce jumps in the ascii path - Remove unused tail_mask - Remove has_negatives intrinsic on x86 (and hook up 32-bit x86 to use count_positives) - ... and 15 more: https://git.openjdk.java.net/jdk/compare/1ca44ef9...531139a1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7231/files - new: https://git.openjdk.java.net/jdk/pull/7231/files/c4bb3612..531139a1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=01-02 Stats: 10910 lines in 329 files changed: 7340 ins; 2150 del; 1420 mod Patch: https://git.openjdk.java.net/jdk/pull/7231.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231 PR: https://git.openjdk.java.net/jdk/pull/7231 From zgu at openjdk.java.net Thu Feb 17 15:52:16 2022 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 17 Feb 2022 15:52:16 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 08:12:02 GMT, Thomas Stuefe wrote: > NMT backend can be further simplified and cleaned out. > > - some entry points require NMT_TrackingLevel as arguments, some use the global tracking level. Ultimately, every part of NMT always uses the global tracking level, so in many cases the explicit parameter can be removed and the global tracking level can be used instead. > - `MemTracker::malloc_header_size(level)` + `MemTracker::malloc_footer_size(level)` are fused into `MemTracker::overhead_per_malloc()` > - when adding to `MallocSiteTable`, caller gets back a shortcut to the entry. That shortcut is stored verbatim in the malloc header. It consists of two 16-bit values (bucket index and chain position). That tupel finds its way into many argument lists. It can be simplified into single 32-bit opaque marker. Code outside the MallocSiteTable does not need to know what it is. > - Currently, the `MallocHeader` class contains a lot of logic. It accounts (in constructor) and de-accounts (in `MallocHeader::release()`). It would simplify code if `MallocHeader` were just a dumb data carrier and the `MallocTracker` would do the actual work. > - `MallocHeader` can be simplified, almost all members made constant and modifying accessors removed. > - In some places we handle inputptr=NULL gracefully where we should assert instead > - Expressions like `MemTracker::tracking_level() != NMT_off` can be simplified to `MemTracker::enabled()`. > - MemTracker::malloc_base (all variants) can be removed. Note that we have MallocTracker::malloc_header, which achieves the same and does not require casting to the header. > > Testing: > > - GHAs > - manually ran NMT gtests (all NMT modes) and NMT jtreg tests on Ubuntu x64 > - SAP nightlies ran through. Note that since 8275301 "Unify C-heap buffer overrun checks into NMT" NMT is enabled by default in debug builds, so it gets a lot more workout in tests now. > > Note that I wanted to manually verify that the gdb "call pp" command still works in order to not break Zhengyu's recent addition, but found its already broken. I filed https://bugs.openjdk.java.net/browse/JDK-8281023 and am preparing a separate patch. Overall is good, a few minor comments. src/hotspot/share/services/mallocSiteTable.cpp line 161: > 159: // Access malloc site > 160: MallocSite* MallocSiteTable::malloc_site(uint32_t marker) { > 161: uint16_t bucket_idx = bucket_idx_from_marker(marker); Please restore assert on bucket_idx. src/hotspot/share/services/mallocTracker.hpp line 296: > 294: NOT_LP64(uint32_t _alt_canary); > 295: const size_t _size; > 296: const uint32_t _mst_marker; make mst_marker a struct? instead of opaque type. src/hotspot/share/services/memTracker.hpp line 115: > 113: static inline void* record_free(void* memblock, NMT_TrackingLevel level) { > 114: // Never turned on > 115: if (level == NMT_off || memblock == NULL) { Wanna add assert `memblock != NULL`? ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7283 From duke at openjdk.java.net Thu Feb 17 16:17:09 2022 From: duke at openjdk.java.net (duke) Date: Thu, 17 Feb 2022 16:17:09 GMT Subject: Withdrawn: 8277930: Add unsafe allocation event to jfr In-Reply-To: References: Message-ID: On Mon, 29 Nov 2021 12:06:02 GMT, xpbob wrote: > Unsafe is used in many Java frameworks. > When the framework has a unsafe memory leak , there is no way to know what code is causing it. > Add unsafe allocation event to jfr. > Records the size and stack allocated. > This event is off by default This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/6591 From jbhateja at openjdk.java.net Thu Feb 17 17:43:43 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 17 Feb 2022 17:43:43 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v6] In-Reply-To: References: Message-ID: > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- > 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 > 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 > 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 > 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Fixing for windows failure. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/73674fe4..f35ed9cf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From hseigel at openjdk.java.net Thu Feb 17 19:17:34 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 17 Feb 2022 19:17:34 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values Message-ID: Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. Thanks, Harold ------------- Commit messages: - 8281472: JVM options processing silently truncates large illegal options values Changes: https://git.openjdk.java.net/jdk/pull/7522/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7522&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8281472 Stats: 148 lines in 3 files changed: 145 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7522/head:pull/7522 PR: https://git.openjdk.java.net/jdk/pull/7522 From kvn at openjdk.java.net Thu Feb 17 20:07:14 2022 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 17 Feb 2022 20:07:14 GMT Subject: RFR: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ [v2] In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 12:01:56 GMT, Emanuel Peter wrote: >> `ZSaveLiveRegisters::ZSaveLiveRegisters` stores live registers, and later they are loaded again. >> This includes opmask registers, which are part of AVX512. However, not all platforms have all of the AVX512 instructions. >> For example Knights Landing has general AVX512 support and makes use of optmask registers, but does not support the AVX512 BW subset of instructions, specifically it does not support the `kmovql` instruction. Platforms like Cannon Landing have support for AVX512 BW. >> >> Solution: in analogy to `RegisterSaver::save_live_registers`, which seems to perform a very similar task, use `MacroAssembler::kmov` instead of `kmovql` directly. Internally, `kmov` choses either `kmovql` if avx512bw is available, else it takes `kmovwl`. >> >> As a regression test, I took one of the tests that failed with `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC`, and added an additional `@run` statement with those flags. I simulated this test locally with Intel Software Development Emulator: >> `sde -knl`: Knights Landing, AVX512 but not BW, fails without change to `kmov`, passes with it. >> `sde -cnl`: Cannon Landing, has AVX512 BW, passes before and after code change. >> >> Ran additional tests to verify that the test triggers before code change, and that with the code change nothing broke. >> >> @neliasso Thanks for the help! > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7510 From neliasso at openjdk.java.net Thu Feb 17 20:07:14 2022 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 17 Feb 2022 20:07:14 GMT Subject: RFR: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ [v2] In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 12:01:56 GMT, Emanuel Peter wrote: >> `ZSaveLiveRegisters::ZSaveLiveRegisters` stores live registers, and later they are loaded again. >> This includes opmask registers, which are part of AVX512. However, not all platforms have all of the AVX512 instructions. >> For example Knights Landing has general AVX512 support and makes use of optmask registers, but does not support the AVX512 BW subset of instructions, specifically it does not support the `kmovql` instruction. Platforms like Cannon Landing have support for AVX512 BW. >> >> Solution: in analogy to `RegisterSaver::save_live_registers`, which seems to perform a very similar task, use `MacroAssembler::kmov` instead of `kmovql` directly. Internally, `kmov` choses either `kmovql` if avx512bw is available, else it takes `kmovwl`. >> >> As a regression test, I took one of the tests that failed with `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC`, and added an additional `@run` statement with those flags. I simulated this test locally with Intel Software Development Emulator: >> `sde -knl`: Knights Landing, AVX512 but not BW, fails without change to `kmov`, passes with it. >> `sde -cnl`: Cannon Landing, has AVX512 BW, passes before and after code change. >> >> Ran additional tests to verify that the test triggers before code change, and that with the code change nothing broke. >> >> @neliasso Thanks for the help! > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7510 From dholmes at openjdk.java.net Thu Feb 17 22:32:11 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 17 Feb 2022 22:32:11 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 19:09:26 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. > > Thanks, Harold src/hotspot/share/runtime/arguments.cpp line 874: > 872: if (v > (julong)max_juint + 1) { > 873: return false; > 874: } This seems very suspicious. Where is the code that depends on this? ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From dholmes at openjdk.java.net Thu Feb 17 22:57:50 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 17 Feb 2022 22:57:50 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 19:09:26 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. > > Thanks, Harold A gtest would seem far simpler to write and allow for easy checking of all the interesting boundary cases: - 0 +/- 1 - max jint +/- 1 - min jint +/- 1 - etc ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From ccheung at openjdk.java.net Thu Feb 17 23:24:05 2022 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Thu, 17 Feb 2022 23:24:05 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v4] In-Reply-To: References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: On Wed, 19 Jan 2022 05:47:57 GMT, Ioi Lam wrote: >> **Background:** >> >> In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: >> >> >> public enum Day { SUNDAY, MONDAY ... } >> >> >> to >> >> >> public class Day extends java.lang.Enum { >> public static final SUNDAY = new Day("SUNDAY"); >> public static final MONDAY = new Day("MONDAY"); ... >> } >> >> >> With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) >> >> **Fix:** >> >> During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. >> >> This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. >> >> **Verification:** >> >> To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt >> >> **Testing:** >> >> Passed Oracle CI tiers 1-4. WIll run tier 5 as well. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Use InstanceKlass::do_local_static_fields for some field iterations Looks good. Minor comment below. Also, several files with copyright year 2021 need updating. src/hotspot/share/cds/cdsHeapVerifier.cpp line 63: > 61: // class Bar { > 62: // // this field is initialized in both CDS dump time and runtime. > 63: // static final Bar bar = new Bar; `new Bar` should be `new Bar()`? ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6653 From duke at openjdk.java.net Fri Feb 18 05:44:34 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Fri, 18 Feb 2022 05:44:34 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v3] In-Reply-To: References: Message-ID: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7227/files - new: https://git.openjdk.java.net/jdk/pull/7227/files/c2ad1c39..561cce33 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7227&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7227&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7227.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7227/head:pull/7227 PR: https://git.openjdk.java.net/jdk/pull/7227 From duke at openjdk.java.net Fri Feb 18 05:44:35 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Fri, 18 Feb 2022 05:44:35 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v2] In-Reply-To: References: Message-ID: <2b60oveZnuqrsYdJ7JapCles3df7i5PKH3ofXyy77s8=.145a285a-c169-4a55-89de-cb882312efdb@github.com> On Thu, 17 Feb 2022 10:24:19 GMT, Markus Gr?nlund wrote: >> KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: >> >> 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. > > src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 90: > >> 88: JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false); >> 89: } >> 90: else { > > The else block can be removed. Just put the guarantee inline with the other code. Thank you so much. You're right. I removed the else block. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From thartmann at openjdk.java.net Fri Feb 18 08:13:53 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 18 Feb 2022 08:13:53 GMT Subject: RFR: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ [v2] In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 12:01:56 GMT, Emanuel Peter wrote: >> `ZSaveLiveRegisters::ZSaveLiveRegisters` stores live registers, and later they are loaded again. >> This includes opmask registers, which are part of AVX512. However, not all platforms have all of the AVX512 instructions. >> For example Knights Landing has general AVX512 support and makes use of optmask registers, but does not support the AVX512 BW subset of instructions, specifically it does not support the `kmovql` instruction. Platforms like Cannon Landing have support for AVX512 BW. >> >> Solution: in analogy to `RegisterSaver::save_live_registers`, which seems to perform a very similar task, use `MacroAssembler::kmov` instead of `kmovql` directly. Internally, `kmov` choses either `kmovql` if avx512bw is available, else it takes `kmovwl`. >> >> As a regression test, I took one of the tests that failed with `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC`, and added an additional `@run` statement with those flags. I simulated this test locally with Intel Software Development Emulator: >> `sde -knl`: Knights Landing, AVX512 but not BW, fails without change to `kmov`, passes with it. >> `sde -cnl`: Cannon Landing, has AVX512 BW, passes before and after code change. >> >> Ran additional tests to verify that the test triggers before code change, and that with the code change nothing broke. >> >> @neliasso Thanks for the help! > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7510 From mgronlun at openjdk.java.net Fri Feb 18 11:23:51 2022 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 18 Feb 2022 11:23:51 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v3] In-Reply-To: References: Message-ID: <_Ozu_sZZUPH-7vFaMfyzJFBv5WxQicM9asfJ9dK_jzg=.e9c28828-ca74-49e6-b0e8-33e79ea8a086@github.com> On Fri, 18 Feb 2022 05:44:34 GMT, KIRIYAMA Takuya wrote: >> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. >> >> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below >> by using JfrJavaSupport::abort(). >> >> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... >> >> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). >> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core >> because there is no space on device. >> Could you please review the fix? > > KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: > > 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. diff --git a/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp b/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp index 95b96e02c06..015d4ebe065 100644 --- a/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp +++ b/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp @@ -563,14 +563,16 @@ void JfrJavaSupport::throw_runtime_exception(const char* message, TRAPS) { void JfrJavaSupport::abort(jstring errorMsg, JavaThread* t) { DEBUG_ONLY(check_java_thread_in_vm(t)); - ResourceMark rm(t); - const char* const error_msg = c_str(errorMsg, t); - if (error_msg != NULL) { - log_error(jfr, system)("%s",error_msg); + abort(c_str(errorMsg, t)); +} + +void JfrJavaSupport::abort(const char* error_msg, bool dump_core /* true */) { + if (error_msg != nullptr) { + log_error(jfr, system)("%s", error_msg); } log_error(jfr, system)("%s", "An irrecoverable error in Jfr. Shutting down VM..."); - vm_abort(); + vm_abort(dump_core); } JfrJavaSupport::CAUSE JfrJavaSupport::_cause = JfrJavaSupport::VM_ERROR; diff --git a/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp b/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp index 53d6eed68a8..1ec5a884b4b 100644 --- a/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp +++ b/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp @@ -112,6 +112,7 @@ class JfrJavaSupport : public AllStatic { // critical static void abort(jstring errorMsg, TRAPS); + static void abort(const char* error_msg, bool dump_core = true); static void uncaught_exception(jthrowable throwable, JavaThread* t); // asserts diff --git a/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp b/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp index 3a7ec286381..73404a1aede 100644 --- a/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp +++ b/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp @@ -25,8 +25,8 @@ #ifndef SHARE_JFR_WRITERS_JFRSTREAMWRITERHOST_INLINE_HPP #define SHARE_JFR_WRITERS_JFRSTREAMWRITERHOST_INLINE_HPP +#include "jfr/jni/jfrJavaSupport.hpp" #include "jfr/writers/jfrStreamWriterHost.hpp" - #include "runtime/os.hpp" template @@ -77,6 +77,9 @@ inline void StreamWriterHost::write_bytes(const u1* buf, intptr_t l while (len > 0) { const unsigned int nBytes = len > INT_MAX ? INT_MAX : (unsigned int)len; const ssize_t num_written = os::write(_fd, buf, nBytes); + if (errno == ENOSPC) { + JfrJavaSupport::abort("Failed to write to jfr stream because no space left on device", false); + } guarantee(num_written > 0, "Nothing got written, or os::write() failed"); _stream_pos += num_written; len -= num_written; src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 88: > 86: JavaThread* jt = JavaThread::current(); > 87: ThreadInVMfromNative transition(jt); > 88: JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false); Hi again Takuya, I'm sorry, but I should have noticed this earlier: I now see that the code needs to allocate a Java string oop to conform to the existing abort function signature, which caters to invocations from Java. Then abort() immediately strips out the c-string from the oop. To be correct, also headers for logging/log.hpp and runtime/thread.inline.hpp should need be included. I believe we can simplify this by updating the abort() signature so that we don't need to drag in those extra dependencies. Please see my following comment where I suggest a way to do this. Thanks for your patience Markus ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From hseigel at openjdk.java.net Fri Feb 18 13:30:52 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 18 Feb 2022 13:30:52 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 22:28:56 GMT, David Holmes wrote: >> Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. >> >> The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. >> >> Thanks, Harold > > src/hotspot/share/runtime/arguments.cpp line 874: > >> 872: if (v > (julong)max_juint + 1) { >> 873: return false; >> 874: } > > This seems very suspicious. Where is the code that depends on this? Test test/hotspot/jtreg/gc/arguments/TestParallelGCThreads.java has the following code: // 4294967295 == (unsigned int) -1 // So setting ParallelGCThreads=4294967295 should give back 4294967295 // and setting ParallelGCThreads=4294967296 should give back 0. (SerialGC is ok with ParallelGCThreads=0) for (long i = 4294967295L; i <= 4294967296L; i++) { long count = getParallelGCThreadCount( "-XX:+UseSerialGC", "-XX:ParallelGCThreads=" + i, "-XX:+PrintFlagsFinal", "-version"); Asserts.assertEQ(count, i % 4294967296L, "Specifying ParallelGCThreads=" + i + " does not set the thread count properly!"); } } ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From dholmes at openjdk.java.net Fri Feb 18 13:50:51 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 18 Feb 2022 13:50:51 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Fri, 18 Feb 2022 13:27:40 GMT, Harold Seigel wrote: >> src/hotspot/share/runtime/arguments.cpp line 874: >> >>> 872: if (v > (julong)max_juint + 1) { >>> 873: return false; >>> 874: } >> >> This seems very suspicious. Where is the code that depends on this? > > Test test/hotspot/jtreg/gc/arguments/TestParallelGCThreads.java has the following code: > > > // 4294967295 == (unsigned int) -1 > // So setting ParallelGCThreads=4294967295 should give back 4294967295 > // and setting ParallelGCThreads=4294967296 should give back 0. (SerialGC is ok with ParallelGCThreads=0) > for (long i = 4294967295L; i <= 4294967296L; i++) { > long count = getParallelGCThreadCount( > "-XX:+UseSerialGC", > "-XX:ParallelGCThreads=" + i, > "-XX:+PrintFlagsFinal", > "-version"); > Asserts.assertEQ(count, i % 4294967296L, "Specifying ParallelGCThreads=" + i + " does not set the thread count properly!"); > } > } That test seems bizarre to me - perhaps someone from GC team can comment on why it expects to see what it sees. But I would not suggest we cripple the argument processing logic just because one test expects it to behave in a strange way. ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From redestad at openjdk.java.net Fri Feb 18 15:57:40 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 18 Feb 2022 15:57:40 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v4] In-Reply-To: References: Message-ID: > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Switch aarch64 intrinsic to a variant of countPositives returning len or zero as a first step. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7231/files - new: https://git.openjdk.java.net/jdk/pull/7231/files/531139a1..a5e28b32 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=02-03 Stats: 59 lines in 8 files changed: 7 ins; 14 del; 38 mod Patch: https://git.openjdk.java.net/jdk/pull/7231.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231 PR: https://git.openjdk.java.net/jdk/pull/7231 From iklam at openjdk.java.net Fri Feb 18 18:50:54 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 18 Feb 2022 18:50:54 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 19:09:26 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. > > Thanks, Harold src/hotspot/share/runtime/arguments.cpp line 889: > 887: // -9223372036854775808. Negating intx_v for such values will erroneously > 888: // make them positive. > 889: if (is_neg && intx_v > 0) { I found it hard to reason with the casts such as `(uintx)(min_intx)`, even though they appear to be correct. I think this will be simpler and more readable: intx_v = (intx) v; if (is_neg) { intx_v = - intx_v; if (intx_v > 0) { return false; // underflow } } else { if (intx_v < 0) { return false; // overflow } } ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From iklam at openjdk.java.net Fri Feb 18 18:56:47 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 18 Feb 2022 18:56:47 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: <-6SnWhIU0I645Nd2EZZtOhJ4nbCx0VOwPjt23mhlco4=.8b1ac953-6f96-48b7-9a87-2b16e2f203f6@github.com> On Fri, 18 Feb 2022 13:47:19 GMT, David Holmes wrote: >> Test test/hotspot/jtreg/gc/arguments/TestParallelGCThreads.java has the following code: >> >> >> // 4294967295 == (unsigned int) -1 >> // So setting ParallelGCThreads=4294967295 should give back 4294967295 >> // and setting ParallelGCThreads=4294967296 should give back 0. (SerialGC is ok with ParallelGCThreads=0) >> for (long i = 4294967295L; i <= 4294967296L; i++) { >> long count = getParallelGCThreadCount( >> "-XX:+UseSerialGC", >> "-XX:ParallelGCThreads=" + i, >> "-XX:+PrintFlagsFinal", >> "-version"); >> Asserts.assertEQ(count, i % 4294967296L, "Specifying ParallelGCThreads=" + i + " does not set the thread count properly!"); >> } >> } > > That test seems bizarre to me - perhaps someone from GC team can comment on why it expects to see what it sees. But I would not suggest we cripple the argument processing logic just because one test expects it to behave in a strange way. Setting ParallelGCThreads=4294967296 isn't a reasonable real-life use case. There's no need to check for the JVM's behavior under this situation (which now becomes illegal). There's already a test for "SerialGC is ok with ParallelGCThreads=0" on line 100 of this file. So the block quoted by Harold should be removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From hseigel at openjdk.java.net Fri Feb 18 19:29:53 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 18 Feb 2022 19:29:53 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Fri, 18 Feb 2022 18:47:30 GMT, Ioi Lam wrote: >> Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. >> >> The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. >> >> Thanks, Harold > > src/hotspot/share/runtime/arguments.cpp line 889: > >> 887: // -9223372036854775808. Negating intx_v for such values will erroneously >> 888: // make them positive. >> 889: if (is_neg && intx_v > 0) { > > I found it hard to reason with the casts such as `(uintx)(min_intx)`, even though they appear to be correct. I think this will be simpler and more readable: > > > intx_v = (intx) v; > if (is_neg) { > intx_v = - intx_v; > if (intx_v > 0) { > return false; // underflow > } > } else { > if (intx_v < 0) { > return false; // overflow > } > } That doesn't work for intx options set to min_intx, such as MaxJNILocalCapacity=-9223372036854775808. Perhaps min_intx should be special cased? ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From ioi.lam at oracle.com Fri Feb 18 19:31:20 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Fri, 18 Feb 2022 11:31:20 -0800 Subject: [RFC containers] 8281571 Do not use CPU Shares to compute active processor count Message-ID: <554c30f8-5d5d-8d98-4e1a-2883cf833f94@oracle.com> Hi Folks, I have filed the CSR https://bugs.openjdk.java.net/browse/JDK-8281571 Summary Modify HotSpot's Linux-only container detection code to not use CPU Shares (the "cpu.shares" file with cgroupv1 or "cpu.weight" file with cgroupv2, exposed through the CgroupSubsystem::cpu_shares() API) to limit the number of active processors that can be used by the JVM. Add a new flag (immediately deprecated), UseContainerCpuShares, to restore the old behaviour; and deprecate the existing PreferContainerQuotaForCPUCount flag. Please refer to the CSR for the reasons for making this change, as well as ways to address compatibility risks. If you have any concerns, please let me know. Otherwise I plan to move the CSR to "finalized" state and start RFR in two weeks. Thanks to Severin Gehwolf and David Holmes for contributing to the CSR. Best Regards - Ioi From duke at openjdk.java.net Sat Feb 19 08:04:52 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Sat, 19 Feb 2022 08:04:52 GMT Subject: RFR: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ [v2] In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 20:03:38 GMT, Nils Eliasson wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix indentation > > Looks good! Thanks @neliasso @TobiHartmann @vnkozlov for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/7510 From shade at openjdk.java.net Mon Feb 21 06:03:50 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 21 Feb 2022 06:03:50 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` Anyone else? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From dholmes at openjdk.java.net Mon Feb 21 06:17:49 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 21 Feb 2022 06:17:49 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` Seems quite reasonable based on the description. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7475 From jiefu at openjdk.java.net Mon Feb 21 06:17:49 2022 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 21 Feb 2022 06:17:49 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` Please also update the copy right year. ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7475 From shade at openjdk.java.net Mon Feb 21 06:17:49 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 21 Feb 2022 06:17:49 GMT Subject: RFR: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: <-P4AB1UaiQODwFcuprTb3Mh2ChfJn6fZ6sct1x2aE58=.a12703e8-f46e-4e35-8ce2-ab8d08082380@github.com> On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From shade at openjdk.java.net Mon Feb 21 06:17:50 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 21 Feb 2022 06:17:50 GMT Subject: Integrated: 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler In-Reply-To: References: Message-ID: On Tue, 15 Feb 2022 09:40:28 GMT, Aleksey Shipilev wrote: > Similar to [JDK-8281744](https://bugs.openjdk.java.net/browse/JDK-8281744), this change improves `TemplateInterpreterGenerator::generate_slow_signature_handler`: there are only a few moves between the jumps, and we can tell `MacroAssembler` those can be short. This code is used to process arguments after the slow call to VM, so the performance improvement is drowned by the call itself. This makes interpreter code a bit more compact, though. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier1` > - [x] Linux x86_32 fastdebug `hotspot:tier1` This pull request has now been integrated. Changeset: d28b048f Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/d28b048f35d5893187076e853a4a898d5ca8b220 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod 8281815: x86: Use short jumps in TIG::generate_slow_signature_handler Reviewed-by: rrich, dholmes, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/7475 From pli at openjdk.java.net Mon Feb 21 06:19:26 2022 From: pli at openjdk.java.net (Pengfei Li) Date: Mon, 21 Feb 2022 06:19:26 GMT Subject: RFR: 8183390: Fix and re-enable post loop vectorization [v4] In-Reply-To: References: Message-ID: > ### Background > > Post loop vectorization is a C2 compiler optimization in an experimental > VM feature called PostLoopMultiversioning. It transforms the range-check > eliminated post loop to a 1-iteration vectorized loop with vector mask. > This optimization was contributed by Intel in 2016 to support x86 AVX512 > masked vector instructions. However, it was disabled soon after an issue > was found. Due to insufficient maintenance in these years, multiple bugs > have been accumulated inside. But we (Arm) still think this is a useful > framework for vector mask support in C2 auto-vectorized loops, for both > x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable > post loop vectorization. > > ### Changes in this patch > > This patch reworks post loop vectorization. The most significant change > is removing vector mask support in C2 x86 backend and re-implementing > it in the mid-end. With this, we can re-enable post loop vectorization > for platforms other than x86. > > Previous implementation hard-codes x86 k1 register as a reserved AVX512 > opmask register and defines two routines (setvectmask/restorevectmask) > to set and restore the value of k1. But after [JDK-8211251](https://bugs.openjdk.java.net/browse/JDK-8211251) which encodes > AVX512 instructions as unmasked by default, generated vector masks are > no longer used in AVX512 vector instructions. To fix incorrect codegen > and add vector mask support for more platforms, we turn to add a vector > mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode > to generate a mask and replace all Load/Store nodes in the post loop > into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This > IR form is exactly the same to those which are used in VectorAPI mask > support. For now, we only add mask inputs for Load/Store nodes because > we don't have reduction operations supported in post loop vectorization. > After this change, the x86 k1 register is no longer reserved and can be > allocated when PostLoopMultiversioning is enabled. > > Besides this change, we have fixed a compiler crash and five incorrect > result issues with post loop vectorization. > > **I) C2 crashes with segmentation fault in strip-mined loops** > > Previous implementation was done before C2 loop strip-mining was merged > into JDK master so it didn't take strip-mined loops into consideration. > In C2's strip mined loops, post loop is not the sibling of the main loop > in ideal loop tree. Instead, it's the sibling of the main loop's parent. > This patch fixed a SIGSEGV issue caused by NULL pointer when locating > post loop from strip-mined main loop. > > **II) Incorrect result issues with post loop vectorization** > > We have also fixed five incorrect vectorization issues. Some of them are > hidden deep and can only be reproduced with corner cases. These issues > have a common cause that it assumes the post loop can be vectorized if > the vectorization in corresponding main loop is successful. But in many > cases this assumption is wrong. Below are details. > > - **[Issue-1] Incorrect vectorization for partial vectorizable loops** > > This issue can be reproduced by below loop where only some operations in > the loop body are vectorizable. > > for (int i = 0; i < 10000; i++) { > res[i] = a[i] * b[i]; > k = 3 * k + 1; > } > > In the main loop, superword can work well if parts of the operations in > loop body are not vectorizable since those parts can be unrolled only. > But for post loops, we don't create vectors through combining scalar IRs > generated from loop unrolling. Instead, we are doing scalars to vectors > replacement for all operations in the loop body. Hence, all operations > should be either vectorized together or not vectorized at all. To fix > this kind of cases, we add an extra field "_slp_vector_pack_count" in > CountedLoopNode to record the eventual count of vector packs in the main > loop. This value is then passed to post loop and compared with post loop > pack count. Vectorization will be bailed out in post loop if it creates > more vector packs than in the main loop. > > - **[Issue-2] Incorrect result in loops with growing-down vectors** > > This issue appears with growing-down vectors, that is, vectors that grow > to smaller memory address as the loop iterates. It can be reproduced by > below counting-up loop with negative scale value in array index. > > for (int i = 0; i < 10000; i++) { > a[MAX - i] = b[MAX - i]; > } > > Cause of this issue is that for a growing-down vector, generated vector > mask value has reversed vector-lane order so it masks incorrect vector > lanes. Note that if negative scale value appears in counting-down loops, > the vector will be growing up. With this rule, we fix the issue by only > allowing positive array index scales in counting-up loops and negative > array index scales in counting-down loops. This check is done with the > help of SWPointer by comparing scale values in each memory access in the > loop with loop stride value. > > - **[Issue-3] Incorrect result in manually unrolled loops** > > This issue can be reproduced by below manually unrolled loop. > > for (int i = 0; i < 10000; i += 2) { > c[i] = a[i] + b[i]; > c[i + 1] = a[i + 1] * b[i + 1]; > } > > In this loop, operations in the 2nd statement duplicate those in the 1st > statement with a small memory address offset. Vectorization in the main > loop works well in this case because C2 does further unrolling and pack > combination. But we cannot vectorize the post loop through replacement > from scalars to vectors because it creates duplicated vector operations. > To fix this, we restrict post loop vectorization to loops with stride > values of 1 or -1. > > - **[Issue-4] Incorrect result in loops with mixed vector element sizes** > > This issue is found after we enable post loop vectorization for AArch64. > It's reproducible by multiple array operations with different element > sizes inside a loop. On x86, there is no issue because the values of x86 > AVX512 opmasks only depend on which vector lanes are active. But AArch64 > is different - the values of SVE predicates also depend on lane size of > the vector. Hence, on AArch64 SVE, if a loop has mixed vector element > sizes, we should use different vector masks. For now, we just support > loops with only one vector element size, i.e., "int + float" vectors in > a single loop is ok but "int + double" vectors in a single loop is not > vectorizable. This fix also enables subword vectors support to make all > primitive type array operations vectorizable. > > - **[Issue-5] Incorrect result in loops with potential data dependence** > > This issue can be reproduced by below corner case on AArch64 only. > > for (int i = 0; i < 10000; i++) { > a[i] = x; > a[i + OFFSET] = y; > } > > In this case, two stores in the loop have data dependence if the OFFSET > value is smaller than the vector length. So we cannot do vectorization > through replacing scalars to vectors. But the main loop vectorization > in this case is successful on AArch64 because AArch64 has partial vector > load/store support. It splits vector fill with different values in lanes > to several smaller-sized fills. In this patch, we add additional data > dependence check for this kind of cases. The check is also done with the > help of SWPointer class. In this check, we require that every two memory > accesses (with at least one store) of the same element type (or subword > size) in the loop has the same array index expression. > > ### Tests > > So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with > experimental VM option "PostLoopMultiversioning" turned on. We found no > issue in all tests. We notice that those existing cases are not enough > because some of above issues are not spotted by them. We would like to > add some new cases but we found existing vectorization tests are a bit > cumbersome - golden results must be pre-calculated and hard-coded in the > test code for correctness verification. Thus, in this patch, we propose > a new vectorization testing framework. > > Our new framework brings a simpler way to add new cases. For a new test > case, we only need to create a new method annotated with "@Test". The > test runner will invoke each annotated method twice automatically. First > time it runs in the interpreter and second time it's forced compiled by > C2. Then the two return results are compared. So in this framework each > test method should return a primitive value or an array of primitives. > In this way, no extra verification code for vectorization correctness is > required. This test runner is still jtreg-based and takes advantages of > the jtreg WhiteBox API, which enables test methods running at specific > compilation levels. Each test class inside is also jtreg-based. It just > need to inherit from the test runner class and run with two additional > options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". > > ### Summary & Future work > > In this patch, we reworked post loop vectorization. We made it platform > independent and fixed several issues inside. We also implemented a new > vectorization testing framework with many test cases inside. Meanwhile, > we did some code cleanups. > > This patch only touches C2 code guarded with PostLoopMultiversioning, > except a few data structure changes. So, there's no behavior change when > experimental VM option PostLoopMultiversioning is off. Also, to reduce > risks, we still propose to keep post loop vectorization experimental for > now. But if it receives positive feedback, we would like to change it to > non-experimental in the future. Pengfei Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into postloop Change-Id: I503edb75f0f626569c776416bfef09651935979c - Update copyright year and rename a function Change-Id: I15845ebd3982edebd4c151284cc6f2ff727630bb - Merge branch 'master' into postloop Change-Id: Ie639c79c9cf016dc68ebf2c0031b60453b45e9a4 - Fix issues in newly added test framework Change-Id: I6e61abf05e9665325cb3abaf407360b18355c6b1 - Merge branch 'master' into postloop Change-Id: I9bb5a808d7540426dedb141fd198d25eb1f569e6 - 8183390: Fix and re-enable post loop vectorization ** Background Post loop vectorization is a C2 compiler optimization in an experimental VM feature called PostLoopMultiversioning. It transforms the range-check eliminated post loop to a 1-iteration vectorized loop with vector mask. This optimization was contributed by Intel in 2016 to support x86 AVX512 masked vector instructions. However, it was disabled soon after an issue was found. Due to insufficient maintenance in these years, multiple bugs have been accumulated inside. But we (Arm) still think this is a useful framework for vector mask support in C2 auto-vectorized loops, for both x86 AVX512 and AArch64 SVE. Hence, we propose this to fix and re-enable post loop vectorization. ** Changes in this patch This patch reworks post loop vectorization. The most significant change is removing vector mask support in C2 x86 backend and re-implementing it in the mid-end. With this, we can re-enable post loop vectorization for platforms other than x86. Previous implementation hard-codes x86 k1 register as a reserved AVX512 opmask register and defines two routines (setvectmask/restorevectmask) to set and restore the value of k1. But after JDK-8211251 which encodes AVX512 instructions as unmasked by default, generated vector masks are no longer used in AVX512 vector instructions. To fix incorrect codegen and add vector mask support for more platforms, we turn to add a vector mask input to C2 mid-end IRs. Specifically, we use a VectorMaskGenNode to generate a mask and replace all Load/Store nodes in the post loop into LoadVectorMasked/StoreVectorMasked nodes with that mask input. This IR form is exactly the same to those which are used in VectorAPI mask support. For now, we only add mask inputs for Load/Store nodes because we don't have reduction operations supported in post loop vectorization. After this change, the x86 k1 register is no longer reserved and can be allocated when PostLoopMultiversioning is enabled. Besides this change, we have fixed a compiler crash and five incorrect result issues with post loop vectorization. - 1) C2 crashes with segmentation fault in strip-mined loops Previous implementation was done before C2 loop strip-mining was merged into JDK master so it didn't take strip-mined loops into consideration. In C2's strip mined loops, post loop is not the sibling of the main loop in ideal loop tree. Instead, it's the sibling of the main loop's parent. This patch fixed a SIGSEGV issue caused by NULL pointer when locating post loop from strip-mined main loop. - 2) Incorrect result issues with post loop vectorization We have also fixed five incorrect vectorization issues. Some of them are hidden deep and can only be reproduced with corner cases. These issues have a common cause that it assumes the post loop can be vectorized if the vectorization in corresponding main loop is successful. But in many cases this assumption is wrong. Below are details. [Issue-1] Incorrect vectorization for partial vectorizable loops This issue can be reproduced by below loop where only some operations in the loop body are vectorizable. for (int i = 0; i < 10000; i++) { res[i] = a[i] * b[i]; k = 3 * k + 1; } In the main loop, superword can work well if parts of the operations in loop body are not vectorizable since those parts can be unrolled only. But for post loops, we don't create vectors through combining scalar IRs generated from loop unrolling. Instead, we are doing scalars to vectors replacement for all operations in the loop body. Hence, all operations should be either vectorized together or not vectorized at all. To fix this kind of cases, we add an extra field "_slp_vector_pack_count" in CountedLoopNode to record the eventual count of vector packs in the main loop. This value is then passed to post loop and compared with post loop pack count. Vectorization will be bailed out in post loop if it creates more vector packs than in the main loop. [Issue-2] Incorrect result in loops with growing-down vectors This issue appears with growing-down vectors, that is, vectors that grow to smaller memory address as the loop iterates. It can be reproduced by below counting-up loop with negative scale value in array index. for (int i = 0; i < 10000; i++) { a[MAX - i] = b[MAX - i]; } Cause of this issue is that for a growing-down vector, generated vector mask value has reversed vector-lane order so it masks incorrect vector lanes. Note that if negative scale value appears in counting-down loops, the vector will be growing up. With this rule, we fix the issue by only allowing positive array index scales in counting-up loops and negative array index scales in counting-down loops. This check is done with the help of SWPointer by comparing scale values in each memory access in the loop with loop stride value. [Issue-3] Incorrect result in manually unrolled loops This issue can be reproduced by below manually unrolled loop. for (int i = 0; i < 10000; i += 2) { c[i] = a[i] + b[i]; c[i + 1] = a[i + 1] * b[i + 1]; } In this loop, operations in the 2nd statement duplicate those in the 1st statement with a small memory address offset. Vectorization in the main loop works well in this case because C2 does further unrolling and pack combination. But we cannot vectorize the post loop through replacement from scalars to vectors because it creates duplicated vector operations. To fix this, we restrict post loop vectorization to loops with stride values of 1 or -1. [Issue-4] Incorrect result in loops with mixed vector element sizes This issue is found after we enable post loop vectorization for AArch64. It's reproducible by multiple array operations with different element sizes inside a loop. On x86, there is no issue because the values of x86 AVX512 opmasks only depend on which vector lanes are active. But AArch64 is different - the values of SVE predicates also depend on lane size of the vector. Hence, on AArch64 SVE, if a loop has mixed vector element sizes, we should use different vector masks. For now, we just support loops with only one vector element size, i.e., "int + float" vectors in a single loop is ok but "int + double" vectors in a single loop is not vectorizable. This fix also enables subword vectors support to make all primitive type array operations vectorizable. [Issue-5] Incorrect result in loops with potential data dependence This issue can be reproduced by below corner case on AArch64 only. for (int i = 0; i < 10000; i++) { a[i] = x; a[i + OFFSET] = y; } In this case, two stores in the loop have data dependence if the OFFSET value is smaller than the vector length. So we cannot do vectorization through replacing scalars to vectors. But the main loop vectorization in this case is successful on AArch64 because AArch64 has partial vector load/store support. It splits vector fill with different values in lanes to several smaller-sized fills. In this patch, we add additional data dependence check for this kind of cases. The check is also done with the help of SWPointer class. In this check, we require that every two memory accesses (with at least one store) of the same element type (or subword size) in the loop has the same array index expression. ** Tests So far we have tested full jtreg on both x86 AVX512 and AArch64 SVE with experimental VM option "PostLoopMultiversioning" turned on. We found no issue in all tests. We notice that those existing cases are not enough because some of above issues are not spotted by them. We would like to add some new cases but we found existing vectorization tests are a bit cumbersome - golden results must be pre-calculated and hard-coded in the test code for correctness verification. Thus, in this patch, we propose a new vectorization testing framework. Our new framework brings a simpler way to add new cases. For a new test case, we only need to create a new method annotated with "@Test". The test runner will invoke each annotated method twice automatically. First time it runs in the interpreter and second time it's forced compiled by C2. Then the two return results are compared. So in this framework each test method should return a primitive value or an array of primitives. In this way, no extra verification code for vectorization correctness is required. This test runner is still jtreg-based and takes advantages of the jtreg WhiteBox API, which enables test methods running at specific compilation levels. Each test class inside is also jtreg-based. It just need to inherit from the test runner class and run with two additional options "-Xbootclasspath/a:." and "-XX:+WhiteBoxAPI". ** Summary & Future work In this patch, we reworked post loop vectorization. We made it platform independent and fixed several issues inside. We also implemented a new vectorization testing framework with many test cases inside. Meanwhile, we did some code cleanups. This patch only touches C2 code guarded with PostLoopMultiversioning, except a few data structure changes. So, there's no behavior change when experimental VM option PostLoopMultiversioning is off. Also, to reduce risks, we still propose to keep post loop vectorization experimental for now. But if it receives positive feedback, we would like to change it to non-experimental in the future. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6828/files - new: https://git.openjdk.java.net/jdk/pull/6828/files/56575886..ea0598ad Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6828&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6828&range=02-03 Stats: 57104 lines in 1757 files changed: 38847 ins; 10161 del; 8096 mod Patch: https://git.openjdk.java.net/jdk/pull/6828.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6828/head:pull/6828 PR: https://git.openjdk.java.net/jdk/pull/6828 From duke at openjdk.java.net Mon Feb 21 07:08:50 2022 From: duke at openjdk.java.net (Emanuel Peter) Date: Mon, 21 Feb 2022 07:08:50 GMT Subject: Integrated: 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ In-Reply-To: References: Message-ID: <8CY3PKUGwuGTB6D-mzhR6218exbBBgA_4xR8n0joyx4=.461f0fd9-a069-4616-9a7e-25d716831d1b@github.com> On Thu, 17 Feb 2022 07:51:42 GMT, Emanuel Peter wrote: > `ZSaveLiveRegisters::ZSaveLiveRegisters` stores live registers, and later they are loaded again. > This includes opmask registers, which are part of AVX512. However, not all platforms have all of the AVX512 instructions. > For example Knights Landing has general AVX512 support and makes use of optmask registers, but does not support the AVX512 BW subset of instructions, specifically it does not support the `kmovql` instruction. Platforms like Cannon Landing have support for AVX512 BW. > > Solution: in analogy to `RegisterSaver::save_live_registers`, which seems to perform a very similar task, use `MacroAssembler::kmov` instead of `kmovql` directly. Internally, `kmov` choses either `kmovql` if avx512bw is available, else it takes `kmovwl`. > > As a regression test, I took one of the tests that failed with `-XX:+UnlockExperimentalVMOptions -XX:+UseZGC`, and added an additional `@run` statement with those flags. I simulated this test locally with Intel Software Development Emulator: > `sde -knl`: Knights Landing, AVX512 but not BW, fails without change to `kmov`, passes with it. > `sde -cnl`: Cannon Landing, has AVX512 BW, passes before and after code change. > > Ran additional tests to verify that the test triggers before code change, and that with the code change nothing broke. > > @neliasso Thanks for the help! This pull request has now been integrated. Changeset: 4e0b81c5 Author: Emanuel Peter Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/4e0b81c596f2a2eae49127b9ee98c80500b4e319 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod 8281544: assert(VM_Version::supports_avx512bw()) failed for Tests jdk/incubator/vector/ Reviewed-by: kvn, neliasso, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/7510 From thartmann at openjdk.java.net Mon Feb 21 09:20:53 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 21 Feb 2022 09:20:53 GMT Subject: RFR: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 13:34:39 GMT, Coleen Phillimore wrote: > In Loom there's a full heap walk when the sweeper is triggered. Many of the triggers in this test case are for the adapters created by the test, which are not deallocated. Since there is a fall back to other code cache heap areas for NonNMethod and for NMethodProfiled, made the function CodeCache::reverse_free_ratio() examine the total code cache available rather than the specific area that it is allocating into. The compilation policy also uses this to increase the C1 compile threshold so also uses the entire free code cache size to calculate new threshold (ask @TobiHartmann about this). Thanks to Tobias for the discussion for this fix. > Tested with tier1-4. Looks good to me. Thanks for fixing this. src/hotspot/share/code/codeCache.cpp line 897: > 895: // Since code heap for each type of code blobs falls forward to the next > 896: // type of code heap, return the reverse free ratio for the entire > 897: // code heap. Suggestion: // Returns the reverse free ratio. E.g., if 25% (1/4) of the code cache // is free, reverse_free_ratio() returns 4. // Since code heap for each type of code blobs falls forward to the next // type of code heap, return the reverse free ratio for the entire // code cache. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7514 From duke at openjdk.java.net Mon Feb 21 12:25:46 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 21 Feb 2022 12:25:46 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v22] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/2062cce7..7f80f289 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=21 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=20-21 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Mon Feb 21 14:51:11 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Mon, 21 Feb 2022 14:51:11 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 Message-ID: Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. ------------- Commit messages: - Fix AsyncGetCallTrace bug Changes: https://git.openjdk.java.net/jdk/pull/7559/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282200 Stats: 25 lines in 2 files changed: 19 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7559.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7559/head:pull/7559 PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Mon Feb 21 15:02:53 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Mon, 21 Feb 2022 15:02:53 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Related to https://github.com/openjdk/jdk/pull/7193 ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From coleenp at openjdk.java.net Mon Feb 21 15:11:30 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 21 Feb 2022 15:11:30 GMT Subject: RFR: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom [v2] In-Reply-To: References: Message-ID: > In Loom there's a full heap walk when the sweeper is triggered. Many of the triggers in this test case are for the adapters created by the test, which are not deallocated. Since there is a fall back to other code cache heap areas for NonNMethod and for NMethodProfiled, made the function CodeCache::reverse_free_ratio() examine the total code cache available rather than the specific area that it is allocating into. The compilation policy also uses this to increase the C1 compile threshold so also uses the entire free code cache size to calculate new threshold (ask @TobiHartmann about this). Thanks to Tobias for the discussion for this fix. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fixed comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7514/files - new: https://git.openjdk.java.net/jdk/pull/7514/files/7a69dc43..97b7a59c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7514&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7514&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7514/head:pull/7514 PR: https://git.openjdk.java.net/jdk/pull/7514 From coleenp at openjdk.java.net Mon Feb 21 15:11:31 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 21 Feb 2022 15:11:31 GMT Subject: RFR: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 13:34:39 GMT, Coleen Phillimore wrote: > In Loom there's a full heap walk when the sweeper is triggered. Many of the triggers in this test case are for the adapters created by the test, which are not deallocated. Since there is a fall back to other code cache heap areas for NonNMethod and for NMethodProfiled, made the function CodeCache::reverse_free_ratio() examine the total code cache available rather than the specific area that it is allocating into. The compilation policy also uses this to increase the C1 compile threshold so also uses the entire free code cache size to calculate new threshold (ask @TobiHartmann about this). Thanks to Tobias for the discussion for this fix. > Tested with tier1-4. Thanks, Tobias. ------------- PR: https://git.openjdk.java.net/jdk/pull/7514 From coleenp at openjdk.java.net Mon Feb 21 15:11:33 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 21 Feb 2022 15:11:33 GMT Subject: RFR: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom [v2] In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 09:15:05 GMT, Tobias Hartmann wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed comment > > src/hotspot/share/code/codeCache.cpp line 897: > >> 895: // Since code heap for each type of code blobs falls forward to the next >> 896: // type of code heap, return the reverse free ratio for the entire >> 897: // code heap. > > Suggestion: > > // Returns the reverse free ratio. E.g., if 25% (1/4) of the code cache > // is free, reverse_free_ratio() returns 4. > // Since code heap for each type of code blobs falls forward to the next > // type of code heap, return the reverse free ratio for the entire > // code cache. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/7514 From eastig at amazon.co.uk Mon Feb 21 16:49:32 2022 From: eastig at amazon.co.uk (Astigeevich, Evgeny) Date: Mon, 21 Feb 2022 16:49:32 +0000 Subject: RFC: AArch64: Set Segmented CodeCache default size to 127M Message-ID: <614FA734-BE19-4F53-B7D4-AFC78A9F1DEE@amazon.com> Hi Andrew, Sorry for the late reply. It was half term time. Thank you for your feedback. > I have seen bug reports from customers mystified > at poor OpenJDK performance which have turned out > to be code cache thrashing. I think we have the case of code cache trashing. An application consumes ~90% of the code cache whatever the code cache is given. If the big code cache is given the set of hot methods becomes sparse. The size of the set is less than 32M. We have a few ideas to solve the trashing. > I'd like to see more information. What was the *average performance > gain* of all your benchmarks? Full dacapo results ('-' means benchmark's time decreased, '+' means increased): +------------+-------------+------------------+-----------------+ | Bench | New vs Base | COV base results | COV new results | +------------+-------------+------------------+-----------------+ | tradebeans | -9.10% | 11.99% | 3.41% | | eclipse | -3.57% | 1.04% | 0.91% | | tradesoap | -3.03% | 0.86% | 0.46% | | tomcat | -1.45% | 0.99% | 0.86% | | pmd | -1.05% | 0.62% | 0.87% | | lusearch | -0.81% | 0.29% | 0.39% | | zxing | -0.46% | 1.28% | 0.82% | | biojava | -0.04% | 0.18% | 0.19% | | jme | 0.01% | 0.01% | 0.01% | | batik | 0.08% | 0.40% | 0.45% | | luindex | 0.42% | 0.56% | 0.70% | | fop | 0.58% | 1.18% | 1.09% | | avrora | 0.72% | 2.05% | 1.45% | | xalan | 0.82% | 2.82% | 3.63% | | sunflow | 4.57% | 10.86% | 10.84% | +------------+-------------+------------------+-----------------+ Each benchmark was run 10 times, 10 iterations per run. The result of the 10th iteration was used. Renaissance results ('-' means benchmark's time decreased, '+' means increased): +------------------+--------------+-------------------+-----------------+ | Bench | New vs Base | COV base results | COV new results | +------------------+--------------+-------------------+-----------------+ | scrabble | -13.47% | 7.01% | 7.43% | | dotty | -9.03% | 1.77% | 1.82% | | naive-bayes | -4.14% | 9.72% | 8.94% | | finagle-http | -3.93% | 0.95% | 0.83% | | finagle-chirper | -2.75% | 2.45% | 3.09% | | movie-lens | -1.79% | 1.39% | 1.12% | | scala-doku | -1.72% | 27.20% | 29.54% | | als | -1.64% | 0.64% | 1.24% | | par-mnemonics | -1.09% | 11.69% | 11.39% | | rx-scrabble | -0.98% | 1.36% | 0.36% | | future-genetic | -0.95% | 1.14% | 2.06% | | log-regression | -0.86% | 0.99% | 1.62% | | dec-tree | -0.74% | 1.52% | 1.69% | | chi-square | -0.51% | 1.20% | 0.85% | | mnemonics | -0.05% | 0.74% | 0.75% | | fj-kmeans | 0.01% | 0.95% | 0.90% | | page-rank | 0.06% | 1.02% | 0.80% | | scala-stm-bench7 | 0.16% | 6.90% | 7.43% | | reactors | 0.97% | 28.07% | 12.42% | | scala-kmeans | 1.22% | 0.88% | 0.39% | | gauss-mix | 1.70% | 1.83% | 1.42% | | akka-uct | 4.30% | 5.20% | 9.94% | | philosophers | 12.64% | 18.43% | 17.64% | +------------------+--------------+-------------------+-----------------+ Each benchmark was run 10 times, 180 seconds per run. The second half of run's results was used. I created https://bugs.openjdk.java.net/browse/JDK-8280872 "AArch64: Position non-nmethod segment in between profiled and non-profiled segments for 128M+ CodeCache". It should reduce the number of trampolines. There are also: https://bugs.openjdk.java.net/browse/JDK-8280152 "AArch64: Duplicated trampolines in C2 NMethod Stub Code section" https://bugs.openjdk.java.net/browse/JDK-8280481 "Duplicated static stubs in NMethod Stub Code section" Implementing them we will improve the code cache usage but they won't fix the code cache trashing. Thanks, Evgeny ?On 11/02/2022, 16:30, "hotspot-dev on behalf of Andrew Haley" wrote: On 2/10/22 23:02, Astigeevich, Evgeny wrote: > We?d like to discuss a proposal for setting TieredCompilation Segmented CodeCache default size to 127M on AArch64 (https://bugs.openjdk.java.net/browse/JDK-8280150). I don't think so, at least not without a lot more information. This would halve the size of the code cache, potentially causing severe regressions in production. I have seen bug reports from customers mystified at poor OpenJDK performance which have turned out to be code cache thrashing. This is very hard to diagnose without making some inspired guesses at what the root cause may be. We'd be moving the threshold for cache exhaustion much closer to our default configuration. So, this is a trade off between a small expected gain and a much larger (but hopefully rare) loss. I'd like to see more information. What was the *average performance gain* of all your benchmarks? I don't think anyone is interested in cherry-picked best cases. A quick back-of-the-envelope calculation tells me that about 3.5% of the code cache is occupied by trampolines and the extra bytes used by far calls. However, many of the far calls are never needed; I don't have stats for that, but I'd guess about half of them. But given the (plausible ?) assumption that the dynamic frequency of calls is the same as the static frequency, I wouldn't be surprised if the cost of trampoline calls is about 2% of the total instruction count, so it'd be nice to be rid of them if there were no cost; but there is a cost. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From duke at openjdk.java.net Mon Feb 21 17:20:57 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 21 Feb 2022 17:20:57 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v23] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Merge master ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/7f80f289..f9882ff1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=22 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=21-22 Stats: 39689 lines in 1308 files changed: 27145 ins; 6944 del; 5600 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From aph-open at littlepinkcloud.com Mon Feb 21 17:38:46 2022 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 21 Feb 2022 17:38:46 +0000 Subject: RFC: AArch64: Set Segmented CodeCache default size to 127M In-Reply-To: <614FA734-BE19-4F53-B7D4-AFC78A9F1DEE@amazon.com> References: <614FA734-BE19-4F53-B7D4-AFC78A9F1DEE@amazon.com> Message-ID: <663add00-20b5-814f-ed12-91f1079dc98a@littlepinkcloud.com> On 2/21/22 16:49, Astigeevich, Evgeny wrote: > Hi Andrew, > > Sorry for the late reply. It was half term time. > > Thank you for your feedback. > >> I have seen bug reports from customers mystified >> at poor OpenJDK performance which have turned out >> to be code cache thrashing. > > I think we have the case of code cache trashing. An application consumes > ~90% of the code cache whatever the code cache is given. If the big code > cache is given the set of hot methods becomes sparse. The size of the set > is less than 32M. We have a few ideas to solve the trashing. Please forgive me, but this paragraph makes no sense to me. I have seen actual thrashing, where hot methods were being evicted and repeatedly recompiled. This thrashing was fixed by increasing the size of the code cache. I take your point about fragmentation, but it happens. >> I'd like to see more information. What was the *average performance >> gain* of all your benchmarks? > > Full dacapo results ('-' means benchmark's time decreased, '+' means increased): I worked it out myself. 0.8% gain, on a bunch of smallish benchmarks. Unknown loss on large programs. The results that went the other way were curious. That suggests to me that there may be some other factors in play. I wonder what they are. > I created https://bugs.openjdk.java.net/browse/JDK-8280872 "AArch64: Position non-nmethod segment in between profiled and non-profiled segments for 128M+ CodeCache". > There are also: > https://bugs.openjdk.java.net/browse/JDK-8280152 "AArch64: Duplicated trampolines in C2 NMethod Stub Code section" > https://bugs.openjdk.java.net/browse/JDK-8280481 "Duplicated static stubs in NMethod Stub Code section" Those look pretty uncontroversial: they won't help anything much if at all, but at least we know they won't regress anything. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dholmes at openjdk.java.net Mon Feb 21 20:40:56 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 21 Feb 2022 20:40:56 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. I marking this as changes requested because I need to investigate further. A `shouldNotReachHere` should never be reached, if it can be reached then the circumstances need investigated to see where the true problem lies. Thanks, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Mon Feb 21 21:17:46 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Mon, 21 Feb 2022 21:17:46 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. I'm willing to help... The described error is not dependent on the JVM being a debug build, I can also reproduce it with a release build by decreasing the sampling interval. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From dholmes at openjdk.java.net Tue Feb 22 01:00:45 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 22 Feb 2022 01:00:45 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Please see updates to JBS issue and the draft PR here: https://github.com/openjdk/jdk/pull/7566 You can either take my changes, or hand over to me and I will use my PR. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Tue Feb 22 05:53:31 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Tue, 22 Feb 2022 05:53:31 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v4] In-Reply-To: References: Message-ID: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7227/files - new: https://git.openjdk.java.net/jdk/pull/7227/files/561cce33..a6958ad6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7227&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7227&range=02-03 Stats: 20 lines in 3 files changed: 4 ins; 9 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/7227.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7227/head:pull/7227 PR: https://git.openjdk.java.net/jdk/pull/7227 From duke at openjdk.java.net Tue Feb 22 05:53:34 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Tue, 22 Feb 2022 05:53:34 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v3] In-Reply-To: <_Ozu_sZZUPH-7vFaMfyzJFBv5WxQicM9asfJ9dK_jzg=.e9c28828-ca74-49e6-b0e8-33e79ea8a086@github.com> References: <_Ozu_sZZUPH-7vFaMfyzJFBv5WxQicM9asfJ9dK_jzg=.e9c28828-ca74-49e6-b0e8-33e79ea8a086@github.com> Message-ID: <1zhYytRXsDS8Ph4K0yM6Xx9Z8vfLAX6daqg3XL8fOU4=.f904c8e0-e083-4c61-a84d-780b01fc9ab7@github.com> On Fri, 18 Feb 2022 11:17:40 GMT, Markus Gr?nlund wrote: >> KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: >> >> 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. > > src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 88: > >> 86: JavaThread* jt = JavaThread::current(); >> 87: ThreadInVMfromNative transition(jt); >> 88: JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false); > > Hi again Takuya, I'm sorry, but I should have noticed this earlier: > > I now see that the code needs to allocate a Java string oop to conform to the existing abort function signature, which caters to invocations from Java. Then abort() immediately strips out the c-string from the oop. To be correct, also headers for logging/log.hpp and runtime/thread.inline.hpp should need be included. > > I believe we can simplify this by updating the abort() signature so that we don't need to drag in those extra dependencies. Please see my following comment where I suggest a way to do this. > > Thanks for your patience > Markus Thank you for your valuable comments. I agree with you. I corrected this fix in accordance with your suggestions. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From shade at openjdk.java.net Tue Feb 22 07:02:11 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 22 Feb 2022 07:02:11 GMT Subject: RFR: 8282224: Correct TIG::bang_stack_shadow_pages comments Message-ID: When reviewing the RISC-V port of the change, I noticed the comment in the x86 code is worded incorrectly: // Record a new watermark, unless the update is above the safe limit. __ cmpptr(rsp, Address(thread, JavaThread::shadow_zone_safe_limit())); __ jccb(Assembler::belowEqual, L_done); Stacks grow downwards, so we are recording a new watermark *when* update is above the safe limit. ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/7569/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7569&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282224 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7569.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7569/head:pull/7569 PR: https://git.openjdk.java.net/jdk/pull/7569 From duke at openjdk.java.net Tue Feb 22 08:43:47 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Tue, 22 Feb 2022 08:43:47 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. To be frank, I would like to integrate your changes into my, because I need a second PR for JDK to be able to write such issues in JBS on my own. To the PR itself: The main difference between both is that with my PR we say "this should not happen please check before if you really want this" and with your PR we don't. I liked your initial PR that threw an error for the normal case that we cannot call this method for a thread in an inconsistent state. As you stated in the comment in the method of your PR, it is only a special case for AsyncGetCallTrace. What is the down side of having to explicitly check for this special case when you need it and otherwise throw an error? ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From mgronlun at openjdk.java.net Tue Feb 22 11:31:53 2022 From: mgronlun at openjdk.java.net (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 22 Feb 2022 11:31:53 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v4] In-Reply-To: References: Message-ID: <9c_NBxB2A64uPN0YgJVjN8L9WqPPVWk1xZeSt4XH8Lc=.28209567-dba1-4623-86e1-314dd506ada7@github.com> On Tue, 22 Feb 2022 05:53:31 GMT, KIRIYAMA Takuya wrote: >> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. >> >> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below >> by using JfrJavaSupport::abort(). >> >> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... >> >> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). >> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core >> because there is no space on device. >> Could you please review the fix? > > KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: > > 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. Looks good, thank you. ------------- Marked as reviewed by mgronlun (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7227 From tschatzl at openjdk.java.net Tue Feb 22 11:42:55 2022 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 22 Feb 2022 11:42:55 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v4] In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 08:17:17 GMT, Christian Hagedorn wrote: >> When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method: >> >> Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, free space=1020k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 >> V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec >> V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 >> V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df >> V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 >> V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d >> V [libjvm.so+0x12091c9] JavaThread::run()+0x167 >> V [libjvm.so+0x1206ada] Thread::call_run()+0x180 >> V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f >> >> This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method. >> >> This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): >> >> Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, free space=1020k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 (c1_Compilation.cpp:607) >> V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) >> V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 (compileBroker.cpp:2291) >> V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df (compileBroker.cpp:1966) >> V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 (compilerThread.cpp:59) >> V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d (thread.cpp:1297) >> V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) >> V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) >> V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f (os_linux.cpp:705) >> >> For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. >> >> The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf >> I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. >> >> The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability. >> >> Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. >> >> **Testing:** >> Apart from manual testing, I've added two kinds of tests: >> - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers. >> - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename. >> >> On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation. >> >> To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number. >> >> I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user). >> >> Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`! >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Make dwarf tag NOT_PRODUCT First pass, did not dive into details of the state machine yet. src/hotspot/share/utilities/elfFile.cpp line 319: > 317: } > 318: log_develop_info(dwarf)("No separate .debuginfo file for library %s. It already contains the required DWARF sections.", _filepath); > 319: _dwarf_file = new (std::nothrow) DwarfFile(_filepath); Would it be useful to explicitly bail out on a `nullptr` value here to avoid crashes below? src/hotspot/share/utilities/elfFile.cpp line 357: > 355: } > 356: > 357: strcpy(debug_pathname, _filepath); I'm always a bit uneasy using "raw" `strcpy` instead of `strncpy` and friends. The code seems to be correct though. src/hotspot/share/utilities/elfFile.cpp line 358: > 356: > 357: strcpy(debug_pathname, _filepath); > 358: char* last_slash = strrchr(debug_pathname, '/'); It's probably no big issue hardcoding the forward slash here instead of using `os::file_separator()` in this method. src/hotspot/share/utilities/elfFile.cpp line 407: > 405: bool ElfFile::load_dwarf_file_from_env_path_folder(const char* env_path, const char* folder, const char* debug_filename, const uint32_t crc) { > 406: char* debug_pathname = NEW_RESOURCE_ARRAY(char, strlen(env_path) + strlen(folder) + strlen(debug_filename) + 2); > 407: strcpy(debug_pathname, env_path); Similar to other resource allocations, this should bail out if the result is `nullptr`. src/hotspot/share/utilities/elfFile.cpp line 566: > 564: // http://sourceware.org/gdb/current/onlinedocs/gdb/Separate-Debug-Files.html#Separate-Debug-Files. > 565: uint32_t ElfFile::gnu_debuglink_crc32(uint32_t crc, uint8_t* buf, const size_t len) { > 566: crc = ~crc & 0xffffffff; The masks are unnecessary here but don't hurt. Feel free to keep. src/hotspot/share/utilities/elfFile.cpp line 576: > 574: log_develop_info(dwarf)("Open DWARF file: %s", filepath); > 575: _dwarf_file = new (std::nothrow) DwarfFile(filepath); > 576: if (!_dwarf_file->is_valid_dwarf_file()) { This should bail out if the `new` returned a `nullptr`. src/hotspot/share/utilities/elfFile.cpp line 686: > 684: } > 685: > 686: // We must align to twice the address size. Since alignment is based on address size? I.e. above, at the check whether addresses are correct, define address size and then multiply by 2 here. This would also make the condition above look nicer, i.e. move the `[NOT_]LP64` outside of the condition. src/hotspot/share/utilities/elfFile.cpp line 784: > 782: } > 783: > 784: if (!_reader.read_byte(&_header._address_size) || NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)) { Since this is the second time for the clause `|| NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)` maybe it is useful to make a constant out of the accepted address size somewhere instead of repeating this over and over. It's value could even be something like `sizeof(intptr_t)` or so. src/hotspot/share/utilities/elfFile.cpp line 814: > 812: log_develop_trace(dwarf)("Series of declarations [code, tag]:"); > 813: AbbreviationDeclaration declaration; > 814: bool found_matching_declaration = false; This variable is never used. Remove. src/hotspot/share/utilities/elfFile.cpp line 944: > 942: #else > 943: _reader.move_position(8); > 944: #endif Use `AddressSize` or similar here instead of the `#ifdef`. src/hotspot/share/utilities/elfFile.cpp line 1026: > 1024: break; > 1025: } else { > 1026: if (!_reader.move_position(4)) { Instead of hardcoding the `4` for lineptr/loclistptr/macptr/rangelistptr it would be nice to have a "DwarfOffset` constant of that value, since we only support 32 bit DWARF. src/hotspot/share/utilities/elfFile.cpp line 1070: > 1068: // reason, GCC is currently using version 3 as specified in the DWARF 3 spec for the line number program even though GCC should > 1069: // be using version 4 for DWARF 4 as it emits DWARF 4 by default. > 1070: return false; According to the specification (pg112): > `version (uhalf)` > A version number (see Appendix F). This number is specific to the line number information > and is independent of the DWARF version number. So this is just fine - actually things may break if the code accepted version 4 here assuming that there are breaking differences. On the other hand Appendix F mentions that DWARF4 contains .debug_line information in version 4. src/hotspot/share/utilities/elfFile.cpp line 1121: > 1119: // _debug_line_offset + 10 (=sizeof(_unit_length) + sizeof(_version) + sizeof(_header_length)) + _header_length. > 1120: _header._file_names_offset = _reader.get_position(); > 1121: if (!_reader.set_position(shdr.sh_offset + _debug_line_offset + 10 + _header._header_length)) { I would prefer a constant for this magic `10`. Thank you for the documentation. src/hotspot/share/utilities/elfFile.hpp line 211: > 209: > 210: // Load the DWARF file (.debuginfo) that belongs to this file. > 211: bool load_dwarf_file(); It would be nice to summarize from which places this methods tries to load the debug info to prevent the need for digging for it in the method implementation. src/hotspot/share/utilities/elfFile.hpp line 300: > 298: * - debug: Prints the results of the steps (1) - (4) together with the generated line information matrix. > 299: * - trace: Full logging information for intermediate states/results when parsing the DWARF file. > 300: */ Maybe add a comment that log output is only supported in non-product builds and the reason. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7126 From vlivanov at openjdk.java.net Tue Feb 22 11:49:26 2022 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 22 Feb 2022 11:49:26 GMT Subject: RFR: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint [v2] In-Reply-To: References: Message-ID: <9wlliZMzFqrTmAOktwaMPDw95W_rVjZRTW0_RAmAhjo=.e8d7ed64-33f6-4295-ae2c-ee7fd8822319@github.com> > MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. > > Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. > > Testing: test/jdk/java/foreign/ w/ -Xint Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Regression test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7459/files - new: https://git.openjdk.java.net/jdk/pull/7459/files/50f68960..17df1875 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7459&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7459&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7459.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7459/head:pull/7459 PR: https://git.openjdk.java.net/jdk/pull/7459 From vlivanov at openjdk.java.net Tue Feb 22 11:49:27 2022 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 22 Feb 2022 11:49:27 GMT Subject: RFR: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 13:40:32 GMT, Vladimir Ivanov wrote: > MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. > > Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. > > Testing: test/jdk/java/foreign/ w/ -Xint Thanks for the reviews, Maurizio, Aleksey, and Vladimir. > maybe consider adding some extra test combinations in TestMatrix I decided to extend `test/jdk/java/foreign/TestDowncall.java` to run a single test in `-Xint` mode: ----------messages:(5/559)---------- command: testng -Xint ... -Dgenerator.sample.factor=100000 TestDowncall ... elapsed time (seconds): 1.031 ... ----------System.out:(7/249)---------- test TestDowncall.testDowncall(0, "f0_V__", VOID, [], []): success =============================================== java/foreign/TestDowncall.java Total tests run: 1, Passes: 1, Failures: 0, Skips: 0 =============================================== ------------- PR: https://git.openjdk.java.net/jdk/pull/7459 From dholmes at openjdk.java.net Tue Feb 22 12:05:50 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 22 Feb 2022 12:05:50 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. I don't like unnecessary special-cases. I added the `ShouldNotReachHere()` due to flawed reasoning, so would like to remove it again and make the code look the way it would have if I had realized about AGCT at the time. Creating a new API just for AGCT to use is not necessary IMO. Cheers. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From chagedorn at openjdk.java.net Tue Feb 22 12:22:52 2022 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 22 Feb 2022 12:22:52 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v4] In-Reply-To: References: Message-ID: On Tue, 8 Feb 2022 08:17:17 GMT, Christian Hagedorn wrote: >> When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method: >> >> Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, free space=1020k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 >> V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec >> V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 >> V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df >> V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 >> V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d >> V [libjvm.so+0x12091c9] JavaThread::run()+0x167 >> V [libjvm.so+0x1206ada] Thread::call_run()+0x180 >> V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f >> >> This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method. >> >> This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): >> >> Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, free space=1020k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 (c1_Compilation.cpp:607) >> V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) >> V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 (compileBroker.cpp:2291) >> V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df (compileBroker.cpp:1966) >> V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 (compilerThread.cpp:59) >> V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d (thread.cpp:1297) >> V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) >> V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) >> V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f (os_linux.cpp:705) >> >> For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. >> >> The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf >> I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. >> >> The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability. >> >> Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. >> >> **Testing:** >> Apart from manual testing, I've added two kinds of tests: >> - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers. >> - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename. >> >> On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation. >> >> To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number. >> >> I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user). >> >> Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`! >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Make dwarf tag NOT_PRODUCT Thank you Thomas for your first pass! I will probably get back to your comments on Monday as I'm taking the rest of the week off. ------------- PR: https://git.openjdk.java.net/jdk/pull/7126 From duke at openjdk.java.net Tue Feb 22 12:43:46 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Tue, 22 Feb 2022 12:43:46 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Good to know. I will change my PR accordingly (if this ok for you) :) ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From eosterlund at openjdk.java.net Tue Feb 22 13:45:50 2022 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 22 Feb 2022 13:45:50 GMT Subject: RFR: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom [v2] In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 15:11:30 GMT, Coleen Phillimore wrote: >> In Loom there's a full heap walk when the sweeper is triggered. Many of the triggers in this test case are for the adapters created by the test, which are not deallocated. Since there is a fall back to other code cache heap areas for NonNMethod and for NMethodProfiled, made the function CodeCache::reverse_free_ratio() examine the total code cache available rather than the specific area that it is allocating into. The compilation policy also uses this to increase the C1 compile threshold so also uses the entire free code cache size to calculate new threshold (ask @TobiHartmann about this). Thanks to Tobias for the discussion for this fix. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fixed comment Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7514 From coleenp at openjdk.java.net Tue Feb 22 13:45:50 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 22 Feb 2022 13:45:50 GMT Subject: RFR: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom [v2] In-Reply-To: References: Message-ID: <2guikodmogzI8BO3HPTZnJ7_RcqG3iYcErVVY_zpH-Y=.218ee8eb-dccc-4f9d-b95b-b9a70474a723@github.com> On Mon, 21 Feb 2022 15:11:30 GMT, Coleen Phillimore wrote: >> In Loom there's a full heap walk when the sweeper is triggered. Many of the triggers in this test case are for the adapters created by the test, which are not deallocated. Since there is a fall back to other code cache heap areas for NonNMethod and for NMethodProfiled, made the function CodeCache::reverse_free_ratio() examine the total code cache available rather than the specific area that it is allocating into. The compilation policy also uses this to increase the C1 compile threshold so also uses the entire free code cache size to calculate new threshold (ask @TobiHartmann about this). Thanks to Tobias for the discussion for this fix. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fixed comment Thanks Erik! ------------- PR: https://git.openjdk.java.net/jdk/pull/7514 From coleenp at openjdk.java.net Tue Feb 22 13:45:51 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 22 Feb 2022 13:45:51 GMT Subject: Integrated: 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom In-Reply-To: References: Message-ID: <0I7-ShhwDMgvG1YqlMOpi4YsyAlRpUo5dAOyuZxCocY=.33864e88-338a-4215-adb8-e520cdb85110@github.com> On Thu, 17 Feb 2022 13:34:39 GMT, Coleen Phillimore wrote: > In Loom there's a full heap walk when the sweeper is triggered. Many of the triggers in this test case are for the adapters created by the test, which are not deallocated. Since there is a fall back to other code cache heap areas for NonNMethod and for NMethodProfiled, made the function CodeCache::reverse_free_ratio() examine the total code cache available rather than the specific area that it is allocating into. The compilation policy also uses this to increase the C1 compile threshold so also uses the entire free code cache size to calculate new threshold (ask @TobiHartmann about this). Thanks to Tobias for the discussion for this fix. > Tested with tier1-4. This pull request has now been integrated. Changeset: 022d8070 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/022d80707c346f4b82ac1eb53e77c634769631e9 Stats: 36 lines in 5 files changed: 6 ins; 10 del; 20 mod 8271008: appcds/*/MethodHandlesAsCollectorTest.java tests time out because of excessive GC (CodeCache GC Threshold) in loom Reviewed-by: thartmann, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/7514 From duke at openjdk.java.net Tue Feb 22 14:05:22 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 22 Feb 2022 14:05:22 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v24] In-Reply-To: References: Message-ID: <3qZYWOySJyCqIHpvp9zg-Co3mBJt19hO5HwTK8NJjIE=.5d70fbfe-16db-45dc-92c0-058b50cb2955@github.com> > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Merge master ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6334/files - new: https://git.openjdk.java.net/jdk/pull/6334/files/f9882ff1..97ae934b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=23 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=22-23 Stats: 274 lines in 17 files changed: 176 ins; 69 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Tue Feb 22 14:35:19 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Tue, 22 Feb 2022 14:35:19 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v25] In-Reply-To: References: Message-ID: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge master - Merge master - Merge master - Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret - Add comments to enter calls - Set PreserveFramePointer if use_rop_protection is set - Merge enter_subframe into enter - Review fixups - Documentation updates - Update copyrights to 2022 - ... and 24 more: https://git.openjdk.java.net/jdk/compare/022d8070...c4e0ee31 ------------- Changes: https://git.openjdk.java.net/jdk/pull/6334/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6334&range=24 Stats: 1481 lines in 35 files changed: 574 ins; 32 del; 875 mod Patch: https://git.openjdk.java.net/jdk/pull/6334.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6334/head:pull/6334 PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Tue Feb 22 18:29:21 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Tue, 22 Feb 2022 18:29:21 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Message-ID: Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. ------------- Commit messages: - fix trailing white space errors - fix whitespaces - revert comment to original for divmodI - Update rax and rdx register usage in x86_64.ad - 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Changes: https://git.openjdk.java.net/jdk/pull/7572/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282221 Stats: 741 lines in 16 files changed: 738 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572 From dholmes at openjdk.java.net Tue Feb 22 21:15:46 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 22 Feb 2022 21:15:46 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Please do update. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From sviswanathan at openjdk.java.net Wed Feb 23 01:34:53 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 23 Feb 2022 01:34:53 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v6] In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 17:43:43 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- >> 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 >> 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 >> 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 >> 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8279508: Fixing for windows failure. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4146: > 4144: vaddpd(xtmp1, src , xtmp1, vec_enc); > 4145: vrndscalepd(dst, xtmp1, 0x4, vec_enc); > 4146: evcvtpd2qq(dst, dst, vec_enc); Why do we need vrndscalepd in between, could we not directly use cvtpd2qq after vaddpd? ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From iklam at openjdk.java.net Wed Feb 23 03:57:16 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 23 Feb 2022 03:57:16 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v5] In-Reply-To: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: > **Background:** > > In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: > > > public enum Day { SUNDAY, MONDAY ... } > > > to > > > public class Day extends java.lang.Enum { > public static final SUNDAY = new Day("SUNDAY"); > public static final MONDAY = new Day("MONDAY"); ... > } > > > With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) > > **Fix:** > > During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. > > This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. > > **Verification:** > > To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt > > **Testing:** > > Passed Oracle CI tiers 1-4. WIll run tier 5 as well. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Fixed comments per @calvinccheung review - Merge branch 'master' into 8275731-heapshared-enum - Use InstanceKlass::do_local_static_fields for some field iterations - Merge branch 'master' into 8275731-heapshared-enum - added exclusions needed by "java -Xshare:dump -ea -esa" - Comments from @calvinccheung off-line - 8275731: CDS archived enums objects are recreated at runtime ------------- Changes: https://git.openjdk.java.net/jdk/pull/6653/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=04 Stats: 850 lines in 16 files changed: 807 ins; 2 del; 41 mod Patch: https://git.openjdk.java.net/jdk/pull/6653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6653/head:pull/6653 PR: https://git.openjdk.java.net/jdk/pull/6653 From iklam at openjdk.java.net Wed Feb 23 04:15:28 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 23 Feb 2022 04:15:28 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v6] In-Reply-To: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: > **Background:** > > In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: > > > public enum Day { SUNDAY, MONDAY ... } > > > to > > > public class Day extends java.lang.Enum { > public static final SUNDAY = new Day("SUNDAY"); > public static final MONDAY = new Day("MONDAY"); ... > } > > > With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) > > **Fix:** > > During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. > > This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. > > **Verification:** > > To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt > > **Testing:** > > Passed Oracle CI tiers 1-4. WIll run tier 5 as well. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/6653/files - new: https://git.openjdk.java.net/jdk/pull/6653/files/4764075e..c6e9be1d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6653/head:pull/6653 PR: https://git.openjdk.java.net/jdk/pull/6653 From dholmes at openjdk.java.net Wed Feb 23 04:33:06 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 04:33:06 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic Message-ID: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. Platforms affected: - all x86 - Zero - Windows Aarch64 - PPC Testing: tiers 1-3 Additional builds: tiers 4 and 5 - builds covered: x86 and Zero GHA - builds covered: Windows-Aarch64 The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. Thanks, David ------------- Commit messages: - 8227369: pd_disjoint_words_atomic() needs to be atomic Changes: https://git.openjdk.java.net/jdk/pull/7567/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7567&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8227369 Stats: 88 lines in 5 files changed: 24 ins; 58 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/7567.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7567/head:pull/7567 PR: https://git.openjdk.java.net/jdk/pull/7567 From eosterlund at openjdk.java.net Wed Feb 23 04:33:06 2022 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 23 Feb 2022 04:33:06 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic In-Reply-To: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Tue, 22 Feb 2022 05:45:12 GMT, David Holmes wrote: > Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. > > See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. > > Platforms affected: > - all x86 > - Zero > - Windows Aarch64 > - PPC > > Testing: tiers 1-3 > Additional builds: tiers 4 and 5 > - builds covered: x86 and Zero > > GHA > - builds covered: Windows-Aarch64 > > The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. > > For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. > > Thanks, > David Looks good, thanks for fixing this. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7567 From mikael at openjdk.java.net Wed Feb 23 04:49:44 2022 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Wed, 23 Feb 2022 04:49:44 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic In-Reply-To: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Tue, 22 Feb 2022 05:45:12 GMT, David Holmes wrote: > Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. > > See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. > > Platforms affected: > - all x86 > - Zero > - Windows Aarch64 > - PPC > > Testing: tiers 1-3 > Additional builds: tiers 4 and 5 > - builds covered: x86 and Zero > > GHA > - builds covered: Windows-Aarch64 > > The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. > > For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. > > Thanks, > David Nice! (Unrelated to/separate from your change I do wonder if the specialized assembly copy code on the "other" platforms actually is warranted. My memory from doing the conjoint copy (with optional swap) is that gcc generates really good code, but maybe there are platforms/toolchains/cases where that's not sufficient.) src/hotspot/share/utilities/copy.hpp line 302: > 300: > 301: protected: > 302: inline static void _shared_disjoint_words_atomic(const HeapWord* from, How about dropping the leading underscore prefix? ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From dholmes at openjdk.java.net Wed Feb 23 05:13:51 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 05:13:51 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 04:42:33 GMT, Mikael Vidstedt wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > src/hotspot/share/utilities/copy.hpp line 302: > >> 300: >> 301: protected: >> 302: inline static void _shared_disjoint_words_atomic(const HeapWord* from, > > How about dropping the leading underscore prefix? Yep will do. Was originally intended (similar to other pd code) to indicate this was a private/internal API, but the protected status achieves the same thing. Thanks for looking at it and the help with the disassembly analysis. :) ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From dholmes at openjdk.java.net Wed Feb 23 05:38:34 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 05:38:34 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: > Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. > > See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. > > Platforms affected: > - all x86 > - Zero > - Windows Aarch64 > - PPC > > Testing: tiers 1-3 > Additional builds: tiers 4 and 5 > - builds covered: x86 and Zero > > GHA > - builds covered: Windows-Aarch64 > > The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. > > For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. > > Thanks, > David David Holmes has updated the pull request incrementally with one additional commit since the last revision: Remove underscore from name ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7567/files - new: https://git.openjdk.java.net/jdk/pull/7567/files/a34aee31..46ecdd29 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7567&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7567&range=00-01 Stats: 6 lines in 5 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/7567.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7567/head:pull/7567 PR: https://git.openjdk.java.net/jdk/pull/7567 From jbhateja at openjdk.java.net Wed Feb 23 05:56:52 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 23 Feb 2022 05:56:52 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Tue, 22 Feb 2022 09:24:47 GMT, Vamsi Parasa wrote: > Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. src/hotspot/cpu/x86/x86_64.ad line 8602: > 8600: __ jmp(done); > 8601: __ bind(neg_divisor_fastpath); > 8602: // Fastpath for divisor < 0: Move in macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 8633: > 8631: __ jmp(done); > 8632: __ bind(neg_divisor_fastpath); > 8633: // Fastpath for divisor < 0: Move in macro assembly rountine. src/hotspot/cpu/x86/x86_64.ad line 8722: > 8720: __ shrl(rax, 31); // quotient > 8721: __ sarl(tmp, 31); > 8722: __ andl(tmp, divisor); Move in macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 8763: > 8761: __ andnq(rax, rax, rdx); > 8762: __ movq(tmp, rax); > 8763: __ shrq(rax, 63); // quotient Move in macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 8902: > 8900: __ subl(tmp_rax, divisor); > 8901: __ andnl(tmp_rax, tmp_rax, rdx); > 8902: __ sarl(tmp_rax, 31); Please move this into a macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 8932: > 8930: // Fastpath when divisor < 0: > 8931: // remainder = dividend - (((dividend & ~(dividend - divisor)) >> (Long.SIZE - 1)) & divisor) > 8932: // See Hacker's Delight (2nd ed), section 9.3 which is implemented in java.lang.Long.remainderUnsigned() Please move it into a macro assembly routine. src/hotspot/share/opto/compile.cpp line 3499: > 3497: Node* d = n->find_similar(Op_UDivI); > 3498: if (d) { > 3499: // Replace them with a fused unsigned divmod if supported Can you explain a bit here, why can't this transformation be handled earlier ? src/hotspot/share/opto/divnode.cpp line 1350: > 1348: return NULL; > 1349: } > 1350: Please remove Value and Ideal routines if no explicit transforms are being done. src/hotspot/share/opto/divnode.cpp line 1362: > 1360: } > 1361: > 1362: //============================================================================= You can remove Ideal routine is not transformation is being done. test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 76: > 74: return quotients; > 75: } > 76: Return seems redundant here. test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 83: > 81: } > 82: return remainders; > 83: } Return seems redundant here. test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 75: > 73: } > 74: return quotients; > 75: } Do we need to return quotients, since it's a field being explicitly modified. test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 82: > 80: remainders[i] = Long.remainderUnsigned(dividends[i], divisors[i]); > 81: } > 82: return remainders; Same as above ------------- PR: https://git.openjdk.java.net/jdk/pull/7572 From mikael at openjdk.java.net Wed Feb 23 06:04:46 2022 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Wed, 23 Feb 2022 06:04:46 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: <7JjwWxlH6KOZ0L76afxGufR2ZxCWbAwQXtYOSI-A1zE=.f21853f3-46da-423d-8358-0765e4617100@github.com> On Wed, 23 Feb 2022 05:38:34 GMT, David Holmes wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove underscore from name Marked as reviewed by mikael (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From stuefe at openjdk.java.net Wed Feb 23 07:22:07 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 23 Feb 2022 07:22:07 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend [v2] In-Reply-To: References: Message-ID: <3YEOwRmxyanbFix2WLt8DMgR6cweiAJlR8SbL6AUFE8=.d44da1b1-fba6-423e-97d8-9106a52a4aa7@github.com> > NMT backend can be further simplified and cleaned out. > > - some entry points require NMT_TrackingLevel as arguments, some use the global tracking level. Ultimately, every part of NMT always uses the global tracking level, so in many cases the explicit parameter can be removed and the global tracking level can be used instead. > - `MemTracker::malloc_header_size(level)` + `MemTracker::malloc_footer_size(level)` are fused into `MemTracker::overhead_per_malloc()` > - when adding to `MallocSiteTable`, caller gets back a shortcut to the entry. That shortcut is stored verbatim in the malloc header. It consists of two 16-bit values (bucket index and chain position). That tupel finds its way into many argument lists. It can be simplified into single 32-bit opaque marker. Code outside the MallocSiteTable does not need to know what it is. > - Currently, the `MallocHeader` class contains a lot of logic. It accounts (in constructor) and de-accounts (in `MallocHeader::release()`). It would simplify code if `MallocHeader` were just a dumb data carrier and the `MallocTracker` would do the actual work. > - `MallocHeader` can be simplified, almost all members made constant and modifying accessors removed. > - In some places we handle inputptr=NULL gracefully where we should assert instead > - Expressions like `MemTracker::tracking_level() != NMT_off` can be simplified to `MemTracker::enabled()`. > - MemTracker::malloc_base (all variants) can be removed. Note that we have MallocTracker::malloc_header, which achieves the same and does not require casting to the header. > > Testing: > > - GHAs > - manually ran NMT gtests (all NMT modes) and NMT jtreg tests on Ubuntu x64 > - SAP nightlies ran through. Note that since 8275301 "Unify C-heap buffer overrun checks into NMT" NMT is enabled by default in debug builds, so it gets a lot more workout in tests now. > > Note that I wanted to manually verify that the gdb "call pp" command still works in order to not break Zhengyu's recent addition, but found its already broken. I filed https://bugs.openjdk.java.net/browse/JDK-8281023 and am preparing a separate patch. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Zhengyus proposals - fix build error after merge (need const variant of malloc_header()) - merge master - pp should handle NULL correctly - remove mostly unused MallocTracker accessors for header members - Remove use of NMT level and simplify malloc+realloc+free - dumb down malloc header - mst bucket+pos=marker - remove malloc_base ------------- Changes: https://git.openjdk.java.net/jdk/pull/7283/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7283&range=01 Stats: 273 lines in 10 files changed: 56 ins; 147 del; 70 mod Patch: https://git.openjdk.java.net/jdk/pull/7283.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7283/head:pull/7283 PR: https://git.openjdk.java.net/jdk/pull/7283 From stuefe at openjdk.java.net Wed Feb 23 07:22:10 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 23 Feb 2022 07:22:10 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend [v2] In-Reply-To: References: Message-ID: <9vTdCLUESqkKUPI4T2Zes4Hk2dU9IbGtJ8Q0I-ugAe4=.60a9181f-d3e7-4195-bfef-198bd1791552@github.com> On Thu, 17 Feb 2022 15:49:09 GMT, Zhengyu Gu wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Zhengyus proposals >> - fix build error after merge (need const variant of malloc_header()) >> - merge master >> - pp should handle NULL correctly >> - remove mostly unused MallocTracker accessors for header members >> - Remove use of NMT level and simplify malloc+realloc+free >> - dumb down malloc header >> - mst bucket+pos=marker >> - remove malloc_base > > Overall is good, a few minor comments. Thanks a lot, @zhengyu123, for your review. Sorry for the delay, I had vacation. I'll implement all your proposals excluding the last one (mst_marker as structure); see comment there. > src/hotspot/share/services/mallocTracker.hpp line 296: > >> 294: NOT_LP64(uint32_t _alt_canary); >> 295: const size_t _size; >> 296: const uint32_t _mst_marker; > > make mst_marker a struct? instead of opaque type. I played around a lot with different forms (struct, union) and in the end settled on an opaque uint32 since - it would be passed by value and I would have to provide that structure in all kind places, I got include circularities - I have a vague improvement in my head where we store the malloc site table entries not as individually malloced elements but in a (dynamically growing) array; that would mean we could address them by index without having to walk the bucket chains; and that index would be a simple number. ------------- PR: https://git.openjdk.java.net/jdk/pull/7283 From shade at openjdk.java.net Wed Feb 23 07:51:48 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 23 Feb 2022 07:51:48 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 05:38:34 GMT, David Holmes wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove underscore from name Looks fine. There might be some performance implications to this, as IIRC this code gets called from GC copying, so some light benchmarking might be in order. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7567 From kbarrett at openjdk.java.net Wed Feb 23 08:16:52 2022 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 23 Feb 2022 08:16:52 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 05:38:34 GMT, David Holmes wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove underscore from name Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7567 From jbhateja at openjdk.java.net Wed Feb 23 09:03:37 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 23 Feb 2022 09:03:37 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v7] In-Reply-To: References: Message-ID: > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- > 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 > 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 > 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 > 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Review comments resolved. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/f35ed9cf..6c869c76 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=05-06 Stats: 7 lines in 2 files changed: 0 ins; 3 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Wed Feb 23 09:03:39 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 23 Feb 2022 09:03:39 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v6] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 01:31:24 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8279508: Fixing for windows failure. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4146: > >> 4144: vaddpd(xtmp1, src , xtmp1, vec_enc); >> 4145: vrndscalepd(dst, xtmp1, 0x4, vec_enc); >> 4146: evcvtpd2qq(dst, dst, vec_enc); > > Why do we need vrndscalepd in between, could we not directly use cvtpd2qq after vaddpd? Thanks @sviswa7 , when a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register or the embedded rounding control bits. DONE. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From mdoerr at openjdk.java.net Wed Feb 23 09:30:53 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 23 Feb 2022 09:30:53 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 05:38:34 GMT, David Holmes wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove underscore from name Works on PPC64. Note: This change may disturb loop optimizations which don't violate atomicity. Performance impact is possible. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7567 From dholmes at openjdk.java.net Wed Feb 23 11:09:51 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 11:09:51 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 07:48:42 GMT, Aleksey Shipilev wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove underscore from name > > Looks fine. There might be some performance implications to this, as IIRC this code gets called from GC copying, so some light benchmarking might be in order. @shipilev any suggestions as to which benchmarks to try to run for this? Otherwise I'll just try our usual internal ones. ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From shade at openjdk.java.net Wed Feb 23 11:23:52 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 23 Feb 2022 11:23:52 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 07:48:42 GMT, Aleksey Shipilev wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove underscore from name > > Looks fine. There might be some performance implications to this, as IIRC this code gets called from GC copying, so some light benchmarking might be in order. > @shipilev any suggestions as to which benchmarks to try to run for this? Otherwise I'll just try our usual internal ones. Just the usual sanity check of benchmarks is fine. If there are regressions on some other benchmarks, we can take care of them after integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From redestad at openjdk.java.net Wed Feb 23 12:59:31 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 23 Feb 2022 12:59:31 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v5] In-Reply-To: References: Message-ID: > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Fix TestCountPositives to correctly allow 0 return when expected != len (for now) - aarch64: fix issue with short inputs divisible by wordSize ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7231/files - new: https://git.openjdk.java.net/jdk/pull/7231/files/a5e28b32..a95680cb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=03-04 Stats: 23 lines in 3 files changed: 3 ins; 4 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/7231.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231 PR: https://git.openjdk.java.net/jdk/pull/7231 From duke at openjdk.java.net Wed Feb 23 13:45:00 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Wed, 23 Feb 2022 13:45:00 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v25] In-Reply-To: References: Message-ID: <8puP07DM-ldrOlaYyU7ex_gpFMjSbRWjfuObimo-XPQ=.fa443055-bf97-4602-a177-4d62682f1f95@github.com> On Tue, 22 Feb 2022 14:35:19 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge master > - Merge master > - Merge master > - Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret > - Add comments to enter calls > - Set PreserveFramePointer if use_rop_protection is set > - Merge enter_subframe into enter > - Review fixups > - Documentation updates > - Update copyrights to 2022 > - ... and 24 more: https://git.openjdk.java.net/jdk/compare/022d8070...c4e0ee31 I did another full jteg run, and everything looks fine. Think that's all the review comments resolved now too. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From redestad at openjdk.java.net Wed Feb 23 14:19:20 2022 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 23 Feb 2022 14:19:20 GMT Subject: RFR: 8281146: Replace StringCoding.hasNegatives with countPositives [v6] In-Reply-To: References: Message-ID: > I'm requesting comments and, hopefully, some help with this patch to replace `StringCoding.hasNegatives` with `countPositives`. The new method does a very similar pass, but alters the intrinsic to return the number of leading bytes in the `byte[]` range which only has positive bytes. This allows for dealing much more efficiently with those `byte[]`s that has a ASCII prefix, with no measurable cost on ASCII-only or latin1/UTF16-mostly input. > > Microbenchmark results: https://jmh.morethan.io/?gists=428b487e92e3e47ccb7f169501600a88,3c585de7435506d3a3bdb32160fe8904 > > - Only implemented on x86 for now, but I want to verify that implementations of `countPositives` can be implemented with similar efficiency on all platforms that today implement a `hasNegatives` intrinsic (aarch64, ppc etc) before moving ahead. This pretty much means holding up this until it's implemented on all platforms, which can either contributed to this PR or as dependent follow-ups. > > - An alternative to holding up until all platforms are on board is to allow the implementation of `StringCoding.hasNegatives` and `countPositives` to be implemented so that the non-intrinsified method calls into the intrinsified. This requires structuring the implementations differently based on which intrinsic - if any - is actually implemented. One way to do this could be to mimic how `java.nio` handles unaligned accesses and expose which intrinsic is available via `Unsafe` into a `static final` field. > > - There are a few minor regressions (~5%) in the x86 implementation on `encode-/decodeLatin1Short`. Those regressions disappear when mixing inputs, for example `encode-/decodeShortMixed` even see a minor improvement, which makes me consider those corner case regressions with little real world implications (if you have latin1 Strings, you're likely to also have ASCII-only strings in your mix). Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Resolve merge conflict - Fix TestCountPositives to correctly allow 0 return when expected != len (for now) - aarch64: fix issue with short inputs divisible by wordSize - Switch aarch64 intrinsic to a variant of countPositives returning len or zero as a first step. - Revert micro changes, split out to #7516 - Merge branch 'master' of https://github.com/cl4es/jdk into count_positives - Merge branch 'master' into count_positives - Restore partial vector checks in AVX2 and SSE intrinsic variants - Let countPositives use hasNegatives to allow ports not implementing the countPositives intrinsic to stay neutral - Simplify changes to encodeUTF8 - ... and 19 more: https://git.openjdk.java.net/jdk/compare/5035bf5e...685795ce ------------- Changes: https://git.openjdk.java.net/jdk/pull/7231/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7231&range=05 Stats: 532 lines in 29 files changed: 308 ins; 53 del; 171 mod Patch: https://git.openjdk.java.net/jdk/pull/7231.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7231/head:pull/7231 PR: https://git.openjdk.java.net/jdk/pull/7231 From coleenp at openjdk.java.net Wed Feb 23 14:27:50 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 23 Feb 2022 14:27:50 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 05:38:34 GMT, David Holmes wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove underscore from name Why does this go to pd_disjoint_words_atomic when it goes forward to the shared code? I suspect the performance implications of the '#else' for x86 is minimal so not worth keeping. ie, it's not really platform dependent anymore really. ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From duke at openjdk.java.net Wed Feb 23 15:07:05 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 15:07:05 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access Message-ID: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. ------------- Commit messages: - Improve os::is_first_C_frame(...) - Add frame::can_access_link(Thread *t) and use it Changes: https://git.openjdk.java.net/jdk/pull/7591/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282306 Stats: 35 lines in 10 files changed: 32 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 15:31:37 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 15:31:37 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. I've updated it. Thanks again. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Wed Feb 23 15:31:36 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 15:31:36 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v2] In-Reply-To: References: Message-ID: <2an2VYo7_ZVP5AuI8q0thE4undL8ejH-EBwf8x9flbc=.6e0418c6-89ab-4b2e-a511-33ac68a44a47@github.com> > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Add changes by David Holmes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7559/files - new: https://git.openjdk.java.net/jdk/pull/7559/files/8364d4b0..9f701eb0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=00-01 Stats: 10 lines in 2 files changed: 6 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/7559.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7559/head:pull/7559 PR: https://git.openjdk.java.net/jdk/pull/7559 From clanger at openjdk.java.net Wed Feb 23 15:42:53 2022 From: clanger at openjdk.java.net (Christoph Langer) Date: Wed, 23 Feb 2022 15:42:53 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: <9suLeDH8iwypmCWZlLeWhGNXHrddUmWDHxsZTqJi0JY=.e83fe160-8f4b-47d9-a4b7-ed8371d258f0@github.com> On Wed, 23 Feb 2022 14:59:49 GMT, Johannes Bechberger wrote: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Changes requested by clanger (Reviewer). src/hotspot/share/runtime/os.cpp line 1227: > 1225: !t->is_in_full_stack((address)fr->fp()) || > 1226: !t->is_in_full_stack((address)fr->sender_sp()) || > 1227: !t->is_in_full_stack((address)fr->link()); Should probably use #ifdef _WINDOWS ... #else ... #endif here ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 15:52:31 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 15:52:31 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v3] In-Reply-To: References: Message-ID: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Johannes Bechberger has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Add changes by David Holmes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7559/files - new: https://git.openjdk.java.net/jdk/pull/7559/files/9f701eb0..ca295d34 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7559.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7559/head:pull/7559 PR: https://git.openjdk.java.net/jdk/pull/7559 From mbaesken at openjdk.java.net Wed Feb 23 15:53:54 2022 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Wed, 23 Feb 2022 15:53:54 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend [v2] In-Reply-To: <3YEOwRmxyanbFix2WLt8DMgR6cweiAJlR8SbL6AUFE8=.d44da1b1-fba6-423e-97d8-9106a52a4aa7@github.com> References: <3YEOwRmxyanbFix2WLt8DMgR6cweiAJlR8SbL6AUFE8=.d44da1b1-fba6-423e-97d8-9106a52a4aa7@github.com> Message-ID: On Wed, 23 Feb 2022 07:22:07 GMT, Thomas Stuefe wrote: >> NMT backend can be further simplified and cleaned out. >> >> - some entry points require NMT_TrackingLevel as arguments, some use the global tracking level. Ultimately, every part of NMT always uses the global tracking level, so in many cases the explicit parameter can be removed and the global tracking level can be used instead. >> - `MemTracker::malloc_header_size(level)` + `MemTracker::malloc_footer_size(level)` are fused into `MemTracker::overhead_per_malloc()` >> - when adding to `MallocSiteTable`, caller gets back a shortcut to the entry. That shortcut is stored verbatim in the malloc header. It consists of two 16-bit values (bucket index and chain position). That tupel finds its way into many argument lists. It can be simplified into single 32-bit opaque marker. Code outside the MallocSiteTable does not need to know what it is. >> - Currently, the `MallocHeader` class contains a lot of logic. It accounts (in constructor) and de-accounts (in `MallocHeader::release()`). It would simplify code if `MallocHeader` were just a dumb data carrier and the `MallocTracker` would do the actual work. >> - `MallocHeader` can be simplified, almost all members made constant and modifying accessors removed. >> - In some places we handle inputptr=NULL gracefully where we should assert instead >> - Expressions like `MemTracker::tracking_level() != NMT_off` can be simplified to `MemTracker::enabled()`. >> - MemTracker::malloc_base (all variants) can be removed. Note that we have MallocTracker::malloc_header, which achieves the same and does not require casting to the header. >> >> Testing: >> >> - GHAs >> - manually ran NMT gtests (all NMT modes) and NMT jtreg tests on Ubuntu x64 >> - SAP nightlies ran through. Note that since 8275301 "Unify C-heap buffer overrun checks into NMT" NMT is enabled by default in debug builds, so it gets a lot more workout in tests now. >> >> Note that I wanted to manually verify that the gdb "call pp" command still works in order to not break Zhengyu's recent addition, but found its already broken. I filed https://bugs.openjdk.java.net/browse/JDK-8281023 and am preparing a separate patch. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Zhengyus proposals > - fix build error after merge (need const variant of malloc_header()) > - merge master > - pp should handle NULL correctly > - remove mostly unused MallocTracker accessors for header members > - Remove use of NMT level and simplify malloc+realloc+free > - dumb down malloc header > - mst bucket+pos=marker > - remove malloc_base please check copyright years, e.g. src/hotspot/share/services/memTracker.cpp (still 2021). Otherwise looks okay to me. ------------- Marked as reviewed by mbaesken (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7283 From duke at openjdk.java.net Wed Feb 23 15:56:52 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 15:56:52 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v3] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 15:52:31 GMT, Johannes Bechberger wrote: >> Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. > > Johannes Bechberger has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Add changes by David Holmes I ran my original tests and found no crashes. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Wed Feb 23 16:10:25 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 16:10:25 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Improve use of C macros ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/9aa9cb6a..4aad3ad2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 16:10:29 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 16:10:29 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: <9suLeDH8iwypmCWZlLeWhGNXHrddUmWDHxsZTqJi0JY=.e83fe160-8f4b-47d9-a4b7-ed8371d258f0@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> <9suLeDH8iwypmCWZlLeWhGNXHrddUmWDHxsZTqJi0JY=.e83fe160-8f4b-47d9-a4b7-ed8371d258f0@github.com> Message-ID: On Wed, 23 Feb 2022 15:39:42 GMT, Christoph Langer wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve use of C macros > > src/hotspot/share/runtime/os.cpp line 1227: > >> 1225: !t->is_in_full_stack((address)fr->fp()) || >> 1226: !t->is_in_full_stack((address)fr->sender_sp()) || >> 1227: !t->is_in_full_stack((address)fr->link()); > > Should probably use > #ifdef _WINDOWS > ... > #else > ... > #endif > > here And also in the original method ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From stuefe at openjdk.java.net Wed Feb 23 18:03:47 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 23 Feb 2022 18:03:47 GMT Subject: RFR: JDK-8281015: Further simplify NMT backend [v2] In-Reply-To: References: <3YEOwRmxyanbFix2WLt8DMgR6cweiAJlR8SbL6AUFE8=.d44da1b1-fba6-423e-97d8-9106a52a4aa7@github.com> Message-ID: On Wed, 23 Feb 2022 15:50:21 GMT, Matthias Baesken wrote: > please check copyright years, e.g. src/hotspot/share/services/memTracker.cpp (still 2021). Otherwise looks okay to me. Thank you @MBaesken ! I will fix copyrights before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/7283 From stuefe at openjdk.java.net Wed Feb 23 19:36:53 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 23 Feb 2022 19:36:53 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 16:10:25 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve use of C macros Hi Johannes, Thanks for doing this, solving this makes sense. But I'm not sure yours is the right approach. I think it would better to use SafeFetch to check the addresses in the relevant registers. Using Safefetch would mean that we don't depend on the existence of Thread (which may be NULL, especially in signal contexts). It would work if the registers erroneously point into unmapped or guarded portions of the stack, or if Thread is corrupted or outdated. And it would be way simpler, since it would not require a new version of is_first_C_frame. I also find the interface - passing Thread* to the function just for it to then do error checking - slightly off. Without any comment on the prototype explaining what this argument is for, this causes head scratching. And semantically, there is only one instance of Thread this can ever be called for. A function like this: // check if frame is valid within the Thread's stack bool Thread::is_valid_frame(const frame*) would actually be clearer. And if this error check is necessary, why do we then need two variants of is_first_c_frame? Should the error check not always happen? But bottom line, I think safefetch would be a simpler and more robust approach. Cheers, Thomas src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp line 155: > 153: inline intptr_t* frame::link() const { return (intptr_t*) *(intptr_t **)addr_at(link_offset); } > 154: > 155: inline bool frame::can_access_link(Thread *thread) const { return thread->is_in_full_stack((address)addr_at(link_offset)); } is there a reason Thread* is non-const in all your variants of can_access_link and is_first_c_frame? src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 120: > 118: } > 119: > 120: inline bool frame::can_access_link(Thread *thread) const { return true; } Why are ppc and s390 different from other platforms? If there is a valid reason, could you please add a short comment? src/hotspot/cpu/zero/frame_zero.inline.hpp line 85: > 83: } > 84: > 85: inline bool frame::can_access_link(Thread *t) const { Did you test zero? Would this not just crash it? src/hotspot/share/runtime/os.cpp line 1223: > 1221: return true; // native stack isn't walkable on windows this way. > 1222: #else > 1223: return !fr->can_access_link(t) || os::is_first_C_frame(fr) || Check t for NULL. ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7591 From bulasevich at openjdk.java.net Wed Feb 23 20:04:20 2022 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Wed, 23 Feb 2022 20:04:20 GMT Subject: RFR: 8280872: Reorder code cache segments to improve code density Message-ID: Currently the codecache segment order is [non-nmethod, non-profiled, profiled]. With this change we move the non-nmethod segment between two code segments. It changes nothing for any platform besides AARCH. In AARCH the offset limit for a branch instruction is 128MB. The bigger jumps are encoded with three instructions. Most of far branches are jumps into the non-nmethod blobs. With the non-nmethod segment in between code segments the jump distance from method to the stub becomes shorter. The result is a 4% reduction in generated code size for the CodeCache range from 128MB to 240MB. As a side effect, the performance of some tests is slightly improved: ``ArraysFill.testCharFill 10 thrpt 15 170235.720 -> 178477.212 ops/ms`` Testing: jdk/hotspot jtreg and microbenchmarks on AMD and AARCH ------------- Commit messages: - fix name: is_non_nmethod, adding target_needs_far_branch func - change codecache segments order: nonprofiled-nonmethod-profiled Changes: https://git.openjdk.java.net/jdk/pull/7517/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7517&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8280872 Stats: 108 lines in 7 files changed: 47 ins; 38 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/7517.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7517/head:pull/7517 PR: https://git.openjdk.java.net/jdk/pull/7517 From duke at openjdk.java.net Wed Feb 23 20:04:22 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 23 Feb 2022 20:04:22 GMT Subject: RFR: 8280872: Reorder code cache segments to improve code density In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 15:40:07 GMT, Boris Ulasevich wrote: > Currently the codecache segment order is [non-nmethod, non-profiled, profiled]. With this change we move the non-nmethod segment between two code segments. It changes nothing for any platform besides AARCH. > > In AARCH the offset limit for a branch instruction is 128MB. The bigger jumps are encoded with three instructions. Most of far branches are jumps into the non-nmethod blobs. With the non-nmethod segment in between code segments the jump distance from method to the stub becomes shorter. The result is a 4% reduction in generated code size for the CodeCache range from 128MB to 240MB. > > As a side effect, the performance of some tests is slightly improved: > ``ArraysFill.testCharFill 10 thrpt 15 170235.720 -> 178477.212 ops/ms`` > > Testing: jdk/hotspot jtreg and microbenchmarks on AMD and AARCH src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 411: > 409: assert(CodeCache::find_blob(entry.target()) != NULL, > 410: "destination of far call not found in code cache"); > 411: if (far_branches()) { Can we write something like this: if (is_target_far_from_heap(entry, cbuf->target_code_heap())) { ... } And the implementation: static inline bool is_target_far_from_heap(Address addr, CodeHeap* heap = nullptr) { if (!SegmentedCodeCache || heap == nullptr) { return ReservedCodeCacheSize > branch_range; } return max_dist_to_heap(addr, heap) > branch_range; } src/hotspot/share/code/codeCache.cpp line 893: > 891: } > 892: > 893: bool CodeCache::is_codestub(address addr) { Should it be named `is_non_nmethod`? According to the comments, there can be buffers, adapters and stubs. ------------- PR: https://git.openjdk.java.net/jdk/pull/7517 From bulasevich at openjdk.java.net Wed Feb 23 20:04:23 2022 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Wed, 23 Feb 2022 20:04:23 GMT Subject: RFR: 8280872: Reorder code cache segments to improve code density In-Reply-To: References: Message-ID: <53JfCz6gB1YlkMd2KLpU_J0oHsW98-8RTE7jXthUBPw=.9684208c-01a8-48c9-9c52-a48b45493ad4@github.com> On Tue, 22 Feb 2022 23:53:19 GMT, Evgeny Astigeevich wrote: >> Currently the codecache segment order is [non-nmethod, non-profiled, profiled]. With this change we move the non-nmethod segment between two code segments. It changes nothing for any platform besides AARCH. >> >> In AARCH the offset limit for a branch instruction is 128MB. The bigger jumps are encoded with three instructions. Most of far branches are jumps into the non-nmethod blobs. With the non-nmethod segment in between code segments the jump distance from method to the stub becomes shorter. The result is a 4% reduction in generated code size for the CodeCache range from 128MB to 240MB. >> >> As a side effect, the performance of some tests is slightly improved: >> ``ArraysFill.testCharFill 10 thrpt 15 170235.720 -> 178477.212 ops/ms`` >> >> Testing: jdk/hotspot jtreg and microbenchmarks on AMD and AARCH > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 411: > >> 409: assert(CodeCache::find_blob(entry.target()) != NULL, >> 410: "destination of far call not found in code cache"); >> 411: if (far_branches()) { > > Can we write something like this: > > if (is_target_far_from_heap(entry, cbuf->target_code_heap())) { > ... > } > > > And the implementation: > > static inline bool is_target_far_from_heap(Address addr, CodeHeap* heap = nullptr) { > if (!SegmentedCodeCache || heap == nullptr) { > return ReservedCodeCacheSize > branch_range; > } > > return max_dist_to_heap(addr, heap) > branch_range; > } Yes, inline expression is difficult to read. I added target_needs_far_branch, I hope it is better now. > src/hotspot/share/code/codeCache.cpp line 893: > >> 891: } >> 892: >> 893: bool CodeCache::is_codestub(address addr) { > > Should it be named `is_non_nmethod`? According to the comments, there can be buffers, adapters and stubs. Ok. Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/7517 From dholmes at openjdk.java.net Wed Feb 23 20:23:49 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 20:23:49 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v3] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 15:52:31 GMT, Johannes Bechberger wrote: >> Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. > > Johannes Bechberger has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Add changes by David Holmes Hi Johannes, Your original changes need removing again. Thanks, David src/hotspot/share/runtime/thread.hpp line 1325: > 1323: // external JNI entry points where the JNIEnv is passed into the VM. > 1324: // Does not return null, check is_thread_from_jni_environment_termminated() > 1325: // if you are not sure that it is not. Needs deleting. src/hotspot/share/runtime/thread.hpp line 1354: > 1352: return current->is_terminated(); > 1353: } > 1354: Needs deleting. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7559 From vlivanov at openjdk.java.net Wed Feb 23 20:32:54 2022 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 23 Feb 2022 20:32:54 GMT Subject: Integrated: 8280901: MethodHandle::linkToNative stub is missing w/ -Xint In-Reply-To: References: Message-ID: On Mon, 14 Feb 2022 13:40:32 GMT, Vladimir Ivanov wrote: > MethodHandle::linkToNative linker doesn't have a dedicated stub for interpreter. A stub for compiled code is shared and it is invoked through i2c stub when accessed from interpreter. In interpreter-only mode, stubs for compiled code are not generated and linkToNative ends up in a broken state where `Method::_from_interpreted_entry` points to `i2c` stub while `Method::_from_compiled_entry` points to `c2i` stub. > > Proposed fix unconditionally generates a stub for `MethodHandle::linkToNative` case irrespective whether it is a interpreter-only mode or not. > > Testing: test/jdk/java/foreign/ w/ -Xint This pull request has now been integrated. Changeset: f86f38a8 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/f86f38a8afd31c76039206f8f1f33371ad814396 Stats: 6 lines in 2 files changed: 5 ins; 0 del; 1 mod 8280901: MethodHandle::linkToNative stub is missing w/ -Xint Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/7459 From dholmes at openjdk.java.net Wed Feb 23 20:36:58 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 20:36:58 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 16:10:25 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve use of C macros I'm struggling to understand the motivation for this change and what problem is being solved. Do all these extra checks need to be done in product bits or would debug-only work? What kind of errors are we trying to guard against by doing this? Thanks, David src/hotspot/share/utilities/vmError.cpp line 338: > 336: // is_first_C_frame() does only simple checks for frame pointer, > 337: // it will pass if java compiled code has a pointer in EBP. > 338: if (os::is_first_C_frame(&fr, t)) return invalid; Is the comment still accurate? ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 21:32:53 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 21:32:53 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 19:06:05 GMT, Thomas Stuefe wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve use of C macros > > src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 120: > >> 118: } >> 119: >> 120: inline bool frame::can_access_link(Thread *thread) const { return true; } > > Why are ppc and s390 different from other platforms? If there is a valid reason, could you please add a short comment? Because they do not (as I see it) directly dereference a location on the stack to get to this value. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 21:39:44 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 21:39:44 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 19:31:03 GMT, Thomas Stuefe wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve use of C macros > > src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp line 155: > >> 153: inline intptr_t* frame::link() const { return (intptr_t*) *(intptr_t **)addr_at(link_offset); } >> 154: >> 155: inline bool frame::can_access_link(Thread *thread) const { return thread->is_in_full_stack((address)addr_at(link_offset)); } > > is there a reason Thread* is non-const in all your variants of can_access_link and is_first_c_frame? No there is none. > src/hotspot/cpu/zero/frame_zero.inline.hpp line 85: > >> 83: } >> 84: >> 85: inline bool frame::can_access_link(Thread *t) const { > > Did you test zero? Would this not just crash it? You're correct, I look into this. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 21:39:45 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 21:39:45 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 20:29:43 GMT, David Holmes wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve use of C macros > > src/hotspot/share/utilities/vmError.cpp line 338: > >> 336: // is_first_C_frame() does only simple checks for frame pointer, >> 337: // it will pass if java compiled code has a pointer in EBP. >> 338: if (os::is_first_C_frame(&fr, t)) return invalid; > > Is the comment still accurate? I think so? But maybe removing the second line would be helpful. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 21:51:46 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 21:51:46 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v3] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 20:18:45 GMT, David Holmes wrote: >> Johannes Bechberger has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Add changes by David Holmes > > src/hotspot/share/runtime/thread.hpp line 1354: > >> 1352: return current->is_terminated(); >> 1353: } >> 1354: > > Needs deleting. Of course. ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From bulasevich at openjdk.java.net Wed Feb 23 21:52:11 2022 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Wed, 23 Feb 2022 21:52:11 GMT Subject: RFR: 8280872: Reorder code cache segments to improve code density [v2] In-Reply-To: References: Message-ID: > Currently the codecache segment order is [non-nmethod, non-profiled, profiled]. With this change we move the non-nmethod segment between two code segments. It changes nothing for any platform besides AARCH. > > In AARCH the offset limit for a branch instruction is 128MB. The bigger jumps are encoded with three instructions. Most of far branches are jumps into the non-nmethod blobs. With the non-nmethod segment in between code segments the jump distance from method to the stub becomes shorter. The result is a 4% reduction in generated code size for the CodeCache range from 128MB to 240MB. > > As a side effect, the performance of some tests is slightly improved: > ``ArraysFill.testCharFill 10 thrpt 15 170235.720 -> 178477.212 ops/ms`` > > Testing: jdk/hotspot jtreg and microbenchmarks on AMD and AARCH Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - fix name: is_non_nmethod, adding target_needs_far_branch func - change codecache segments order: nonprofiled-nonmethod-profiled increase far jump threshold: sideof(codecache)=128M -> sizeof(nonprofiled+nonmethod)=128M ------------- Changes: https://git.openjdk.java.net/jdk/pull/7517/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7517&range=01 Stats: 107 lines in 7 files changed: 46 ins; 38 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/7517.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7517/head:pull/7517 PR: https://git.openjdk.java.net/jdk/pull/7517 From duke at openjdk.java.net Wed Feb 23 21:58:56 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 21:58:56 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: <4D3MbB3BO800obAYOqficpSlewTQdQW_y7oP78NQoGg=.9d3b6f0b-1b1e-405f-9bbf-64cb5a46976b@github.com> On Wed, 23 Feb 2022 20:33:28 GMT, David Holmes wrote: > Do all these extra checks need to be done in product bits or would debug-only work? What kind of errors are we trying to guard against by doing this? They currently do not affect production code, but I forgot that the `NativeCallStack` class exists that can make use of it (especially when using the simpler API as @tstuefe correctly proposed). The main motivation is to prevent crashes in native stack walking in cases where just calling `frame.is_safe_for_sender` would return false, but a walk is still possible (typically on the bottom of the native call stack). I currently observe these crashes while walking on AsyncGetCallTrace modifications. And to @tstuefe: > But bottom line, I think safefetch would be a simpler and more robust approach. Thanks for the comment. I missed that safefetch does exactly what I want,and hopefully without a large performance penalty?). ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From dholmes at openjdk.java.net Wed Feb 23 21:59:06 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 21:59:06 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: <35D5t24EajEpM7JwVi9Q36-admzDEGaIcwpeKDtvtgo=.fe380706-76ce-4fdd-8296-725a64632702@github.com> On Wed, 23 Feb 2022 19:26:50 GMT, Thomas Stuefe wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve use of C macros > > src/hotspot/share/runtime/os.cpp line 1223: > >> 1221: return true; // native stack isn't walkable on windows this way. >> 1222: #else >> 1223: return !fr->can_access_link(t) || os::is_first_C_frame(fr) || > > Check t for NULL. I would assert for not NULL and ensure the caller only uses this with a non-NULL thread. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From dholmes at openjdk.java.net Wed Feb 23 21:59:13 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 21:59:13 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 21:35:53 GMT, Johannes Bechberger wrote: >> src/hotspot/share/utilities/vmError.cpp line 338: >> >>> 336: // is_first_C_frame() does only simple checks for frame pointer, >>> 337: // it will pass if java compiled code has a pointer in EBP. >>> 338: if (os::is_first_C_frame(&fr, t)) return invalid; >> >> Is the comment still accurate? > > I think so? But maybe removing the second line would be helpful. But are the checks still "simple"? ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 21:59:35 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 21:59:35 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v4] In-Reply-To: References: Message-ID: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Remove old code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7559/files - new: https://git.openjdk.java.net/jdk/pull/7559/files/ca295d34..b5bd5f6e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7559&range=02-03 Stats: 16 lines in 1 file changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7559.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7559/head:pull/7559 PR: https://git.openjdk.java.net/jdk/pull/7559 From dholmes at openjdk.java.net Wed Feb 23 22:02:11 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 22:02:11 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 16:10:25 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve use of C macros src/hotspot/share/runtime/os.cpp line 1227: > 1225: !t->is_in_full_stack((address)fr->fp()) || > 1226: !t->is_in_full_stack((address)fr->sender_sp()) || > 1227: !t->is_in_full_stack((address)fr->link()); Isn't this check of `fr.link()` what you already did in `can_access_link`? ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From dholmes at openjdk.java.net Wed Feb 23 22:04:03 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 23 Feb 2022 22:04:03 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v4] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 21:59:35 GMT, Johannes Bechberger wrote: >> Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Remove old code Looks good to me (but I am biased :) )! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Wed Feb 23 22:10:06 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 22:10:06 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 21:58:59 GMT, David Holmes wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve use of C macros > > src/hotspot/share/runtime/os.cpp line 1227: > >> 1225: !t->is_in_full_stack((address)fr->fp()) || >> 1226: !t->is_in_full_stack((address)fr->sender_sp()) || >> 1227: !t->is_in_full_stack((address)fr->link()); > > Isn't this check of `fr.link()` what you already did in `can_access_link`? You're correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 22:36:03 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 22:36:03 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 22:06:47 GMT, Johannes Bechberger wrote: >> src/hotspot/share/runtime/os.cpp line 1227: >> >>> 1225: !t->is_in_full_stack((address)fr->fp()) || >>> 1226: !t->is_in_full_stack((address)fr->sender_sp()) || >>> 1227: !t->is_in_full_stack((address)fr->link()); >> >> Isn't this check of `fr.link()` what you already did in `can_access_link`? > > You're correct. But as I said, I'm going to remove these checks all to gether. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 22:42:04 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 22:42:04 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 21:56:49 GMT, David Holmes wrote: >> I think so? But maybe removing the second line would be helpful. > > But are the checks still "simple"? After the change proposed by Thomas: I think so, it still only checks the pointer value and safefetches the value of the stack pointer, ... to check whether they are valid. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 22:48:07 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 23 Feb 2022 22:48:07 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 05:43:10 GMT, Jatin Bhateja wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. > > src/hotspot/cpu/x86/x86_64.ad line 8602: > >> 8600: __ jmp(done); >> 8601: __ bind(neg_divisor_fastpath); >> 8602: // Fastpath for divisor < 0: > > Move in macro assembly routine. Sure, will move it to a macro assembly routine > src/hotspot/cpu/x86/x86_64.ad line 8633: > >> 8631: __ jmp(done); >> 8632: __ bind(neg_divisor_fastpath); >> 8633: // Fastpath for divisor < 0: > > Move in macro assembly rountine. Sure, will move it to a macro assembly routine > src/hotspot/cpu/x86/x86_64.ad line 8902: > >> 8900: __ subl(tmp_rax, divisor); >> 8901: __ andnl(tmp_rax, tmp_rax, rdx); >> 8902: __ sarl(tmp_rax, 31); > > Please move this into a macro assembly routine. Sure, will move it to a macro assembly routine > src/hotspot/cpu/x86/x86_64.ad line 8932: > >> 8930: // Fastpath when divisor < 0: >> 8931: // remainder = dividend - (((dividend & ~(dividend - divisor)) >> (Long.SIZE - 1)) & divisor) >> 8932: // See Hacker's Delight (2nd ed), section 9.3 which is implemented in java.lang.Long.remainderUnsigned() > > Please move it into a macro assembly routine. Sure, will move it to a macro assembly routine > src/hotspot/share/opto/compile.cpp line 3499: > >> 3497: Node* d = n->find_similar(Op_UDivI); >> 3498: if (d) { >> 3499: // Replace them with a fused unsigned divmod if supported > > Can you explain a bit here, why can't this transformation be handled earlier ? This is following the existing approach being used for signed DivMod > test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 75: > >> 73: } >> 74: return quotients; >> 75: } > > Do we need to return quotients, since it's a field being explicitly modified. Will remove it. > test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 82: > >> 80: remainders[i] = Long.remainderUnsigned(dividends[i], divisors[i]); >> 81: } >> 82: return remainders; > > Same as above Will remove it. ------------- PR: https://git.openjdk.java.net/jdk/pull/7572 From duke at openjdk.java.net Wed Feb 23 22:51:45 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 22:51:45 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v2] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 16:10:25 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve use of C macros The last commit rewrites it to something that might resemble Thomas' ideas. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 22:51:44 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Wed, 23 Feb 2022 22:51:44 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v3] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Use safefetch ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/4aad3ad2..5b7d6004 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=01-02 Stats: 50 lines in 10 files changed: 3 ins; 36 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Wed Feb 23 22:55:11 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 23 Feb 2022 22:55:11 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 05:52:00 GMT, Jatin Bhateja wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. > > test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 76: > >> 74: return quotients; >> 75: } >> 76: > > Return seems redundant here. Will remove it. > test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 83: > >> 81: } >> 82: return remainders; >> 83: } > > Return seems redundant here. Will remove it. ------------- PR: https://git.openjdk.java.net/jdk/pull/7572 From duke at openjdk.java.net Wed Feb 23 23:11:03 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 23 Feb 2022 23:11:03 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 05:46:45 GMT, Jatin Bhateja wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. > > src/hotspot/share/opto/divnode.cpp line 1350: > >> 1348: return NULL; >> 1349: } >> 1350: > > Please remove Value and Ideal routines if no explicit transforms are being done. Will remove the unused transformations. > src/hotspot/share/opto/divnode.cpp line 1362: > >> 1360: } >> 1361: >> 1362: //============================================================================= > > You can remove Ideal routine is not transformation is being done. Will remove the unused transformation. ------------- PR: https://git.openjdk.java.net/jdk/pull/7572 From duke at openjdk.java.net Wed Feb 23 23:18:56 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 23 Feb 2022 23:18:56 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3] In-Reply-To: References: Message-ID: > Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix line at end of file ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/7fc18af3..13549290 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572 From duke at openjdk.java.net Wed Feb 23 23:15:53 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Wed, 23 Feb 2022 23:15:53 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2] In-Reply-To: References: Message-ID: > Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Move intrinsic code to macro assembly routines; remove unused transformations for div and mod nodes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/fa57175a..7fc18af3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=00-01 Stats: 326 lines in 7 files changed: 137 ins; 176 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572 From sviswanathan at openjdk.java.net Thu Feb 24 00:47:06 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 24 Feb 2022 00:47:06 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v7] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 09:03:37 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- >> 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 >> 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 >> 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 >> 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8279508: Review comments resolved. Also curious, how does the performance look with all these changes. src/hotspot/cpu/x86/assembler_x86.hpp line 2254: > 2252: void vroundps(XMMRegister dst, XMMRegister src, int32_t rmode, int vector_len); > 2253: void vrndscaleps(XMMRegister dst, XMMRegister src, int32_t rmode, int vector_len); > 2254: These instructions are not used anymore and can be removed. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4116: > 4114: KRegister ktmp1, KRegister ktmp2, AddressLiteral double_sign_flip, > 4115: Register scratch, int vec_enc) { > 4116: evcvttpd2qq(dst, src, vec_enc); The vcvttpd2qq instruction on overflow sets the result as 2^w -1 where w is 64. Whereas the special case handling is expecting 0x80000..... src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4145: > 4143: evpbroadcastq(xtmp1, scratch, vec_enc); > 4144: vaddpd(xtmp1, src , xtmp1, vec_enc); > 4145: evcvtpd2qq(dst, xtmp1, vec_enc); The vcvtpd2qq instruction on overflow also sets the result as 2^w -1 where w is 64. Whereas the special case handling is expecting 0x80000..... src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4176: > 4174: vpbroadcastd(xtmp1, xtmp1, vec_enc); > 4175: vaddps(xtmp1, src , xtmp1, vec_enc); > 4176: vcvtps2dq(dst, xtmp1, vec_enc); The vcvtps2dq returns 0x7FFFFFFF in case of overflow whereas the special case handling expects 0x80000000 incase of overflow. The same question applies to the corresponding vector_round_float_avx() implementation as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From sviswanathan at openjdk.java.net Thu Feb 24 01:47:07 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 24 Feb 2022 01:47:07 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v7] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 09:03:37 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- >> 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 >> 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 >> 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 >> 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8279508: Review comments resolved. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8984: > 8982: } > 8983: > 8984: void MacroAssembler::round_double(Register dst, XMMRegister src, Register rtmp, Register rcx) { Is it possible to implement this using the similar mxcsr change? In any case comments will help to review round_double and round_float code. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From sviswanathan at openjdk.java.net Thu Feb 24 02:00:05 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 24 Feb 2022 02:00:05 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v7] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 09:03:37 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- >> 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 >> 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 >> 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 >> 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8279508: Review comments resolved. test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java line 441: > 439: errn += verify("test_round: ", 1, l0[1], Long.MAX_VALUE); > 440: errn += verify("test_round: ", 2, l0[2], Long.MIN_VALUE); > 441: errn += verify("test_round: ", 3, l0[3], Long.MAX_VALUE); Good to add additional test cases: Case with a1 value >= Long Max and < infinity. Case with a1 value <= Long Min and > -infinity. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Thu Feb 24 02:43:46 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Thu, 24 Feb 2022 02:43:46 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4] In-Reply-To: References: Message-ID: > Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix 32bit build issues ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/13549290..2915b2e7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7572&range=02-03 Stats: 91 lines in 2 files changed: 49 ins; 42 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572 From dholmes at openjdk.java.net Thu Feb 24 02:45:07 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 24 Feb 2022 02:45:07 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v3] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 22:51:44 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use safefetch src/hotspot/share/runtime/os.cpp line 1192: > 1190: > 1191: uintptr_t usp = (uintptr_t)fr->sp(); > 1192: if ((usp & sp_align_mask) != 0 || SafeFetchN(fr->sp(), 0) == 0) return true; This doesn't quite make sense to me. If the SafeFetchN were to fail then the load in the previous line would already have crashed wouldn't it? ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From dholmes at openjdk.java.net Thu Feb 24 03:53:05 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 24 Feb 2022 03:53:05 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v3] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Thu, 24 Feb 2022 02:41:25 GMT, David Holmes wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Use safefetch > > src/hotspot/share/runtime/os.cpp line 1192: > >> 1190: >> 1191: uintptr_t usp = (uintptr_t)fr->sp(); >> 1192: if ((usp & sp_align_mask) != 0 || SafeFetchN(fr->sp(), 0) == 0) return true; > > This doesn't quite make sense to me. If the SafeFetchN were to fail then the load in the previous line would already have crashed wouldn't it? Sorry ignore that. The SafeFetch loads `*fr->sp()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From dholmes at openjdk.java.net Thu Feb 24 03:53:04 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 24 Feb 2022 03:53:04 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v3] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 22:51:44 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use safefetch This approach looks much better/cleaner - thanks. Do we have any crash tests we can use to verify this? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From stuefe at openjdk.java.net Thu Feb 24 06:21:08 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 24 Feb 2022 06:21:08 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v3] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 22:51:44 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use safefetch Hi Johannes, thanks for taking my suggestion. This is better, and helps beyond your AsyncGetCallTrace scenario (e.g. in NMT). safefetch works as an unconditional sub routine call to a prolog-free piece of code which does a single load. Basically: (1) jump -> (2) load from questionable address -> (3) return and the signal handler knows how to handle things if a segfault happens at (2). So, for the standard case, if no fault happens, you pay for a subroutine call and a load. This is as cheap as it gets, but still not as cheap as a single inline load would be. --- Still, I'm not sure I would add this to such a low-level function as frame::link(), at least not without analyzing the callers. Most of the callers of frame::link don't seem to be that performance-sensitive that a sub-routine call would throw them off. But I'm not sure here. Moreover, even though your solution is beautifully simple, I don't like "lying" at this level. There may be cases where we rather have an honest crash when dereferencing an invalid frame, because we may want to analyze the root cause. What I actually had in mind - sorry I was not too clear in my first review - was to use SafeFetch inside is_first_C_frame to check the validity of the link before dereferencing it. `is_first_C_frame()` is not super performace-critical, so it should be fine to use safefetch here. Note that we have `os::is_readable_pointer()` which encapsulates SafeFetch for checking pointer validity. So I imagine something like this: bool frame::link_is_valid() { return os::is_readable_pointer(link); } ... bool os::is_first_C_frame(frame* fr) { ... // If the link address is invalid we are not walkable beyond this point. if (!fr->link_is_valid()) return true; } @dholmes-ora : the motivation is to harden a piece of code which may run in unsafe situations in production scenarios. Examples: AsyncGetCallTrace, stack printing in error reports, stack printing in NMT... Error handling has its secondary crash guards, but the other scenarios are "naked". And we have downstream additional facilities which use VM stack printing. About a test, I agree, that would be nice. But one would have to "fake" an invalid stack. Maybe a new error reporting test where one deliberately overwrites portions of the stack and then tries to print the stack. However, I imagine things could be brittle, because the OS may catch a stack overwrite first. It's not totally trivial, maybe something for a separate RFE? Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Thu Feb 24 07:25:07 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Thu, 24 Feb 2022 07:25:07 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v3] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Wed, 23 Feb 2022 22:51:44 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Use safefetch Know I understand. I simple test would be to just allocate an area of zeroes and then create a frame for it. The proposed changes should prevent it from crashing. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From mdoerr at openjdk.java.net Thu Feb 24 09:32:12 2022 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 24 Feb 2022 09:32:12 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v4] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 21:59:35 GMT, Johannes Bechberger wrote: >> Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Remove old code LGTM. src/hotspot/share/prims/forte.cpp line 565: > 563: JNIEXPORT > 564: void AsyncGetCallTrace(ASGCT_CallTrace *trace, jint depth, void* ucontext) { > 565: Feel free to remove the extra newline. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7559 From kevinw at openjdk.java.net Thu Feb 24 10:10:06 2022 From: kevinw at openjdk.java.net (Kevin Walls) Date: Thu, 24 Feb 2022 10:10:06 GMT Subject: RFR: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 [v4] In-Reply-To: References: Message-ID: On Wed, 23 Feb 2022 21:59:35 GMT, Johannes Bechberger wrote: >> Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Remove old code Marked as reviewed by kevinw (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Thu Feb 24 10:55:28 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 24 Feb 2022 10:55:28 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v25] In-Reply-To: References: Message-ID: <0Ob4kezo_Q0ro0eF_OeEABrzYeZCNmoaD5KQUcBpZRc=.6c772f45-c70c-4983-880a-8878e281d04b@github.com> On Tue, 22 Feb 2022 14:35:19 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge master > - Merge master > - Merge master > - Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret > - Add comments to enter calls > - Set PreserveFramePointer if use_rop_protection is set > - Merge enter_subframe into enter > - Review fixups > - Documentation updates > - Update copyrights to 2022 > - ... and 24 more: https://git.openjdk.java.net/jdk/compare/022d8070...c4e0ee31 Any more comments? Otherwise I'll integrate later ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From dholmes at openjdk.java.net Thu Feb 24 11:48:18 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 24 Feb 2022 11:48:18 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 05:38:34 GMT, David Holmes wrote: >> Replace the common "atomic" switch+loop code chunks in the pd code with a shared version that uses Atomic::load/store. >> >> See details in the bug report that show how current code is actually replaced by `memcpy` (in some places at least) whereas the new code is not. >> >> Platforms affected: >> - all x86 >> - Zero >> - Windows Aarch64 >> - PPC >> >> Testing: tiers 1-3 >> Additional builds: tiers 4 and 5 >> - builds covered: x86 and Zero >> >> GHA >> - builds covered: Windows-Aarch64 >> >> The only build affected and not tested is PPC. It would be great if someone could take this for a spin on PPC. >> >> For platforms not affected by this change, i.e. those that already specialise the code, I make not claims regarding the atomicity or otherwise of those specialized versions. That would be for someone interested in those specific platforms to check out. >> >> Thanks, >> David > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove underscore from name I ran some GC benchmarks which turned out to be just specjbb2005 and specjvm2008-*. There were two regressions flagged: Linux-x64: SPECjvm2008-LU.large-ZGC -5.82% macos-x64: SPECjvm2008-Serial-ParGC -4.16% However, Erik thinks these are just noise as apparently ZGC doesn't use these atomic copy routines, nor does he think ParGC does either. Thoughts? ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From coleenp at openjdk.java.net Thu Feb 24 12:56:43 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 24 Feb 2022 12:56:43 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only Message-ID: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. Tested with tier1 on Oracle platforms. ------------- Commit messages: - 8282240: Add _name field to Method for NOT_PRODUCT only Changes: https://git.openjdk.java.net/jdk/pull/7608/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282240 Stats: 14 lines in 4 files changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7608.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7608/head:pull/7608 PR: https://git.openjdk.java.net/jdk/pull/7608 From jbhateja at openjdk.java.net Thu Feb 24 13:01:59 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 24 Feb 2022 13:01:59 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v7] In-Reply-To: References: Message-ID: On Thu, 24 Feb 2022 01:43:27 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8279508: Review comments resolved. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8984: > >> 8982: } >> 8983: >> 8984: void MacroAssembler::round_double(Register dst, XMMRegister src, Register rtmp, Register rcx) { > > Is it possible to implement this using the similar mxcsr change? In any case comments will help to review round_double and round_float code. LDMXCSR has multi-cycle latency and it will degrade the performance of scalar operation's fast path. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Thu Feb 24 13:01:58 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 24 Feb 2022 13:01:58 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v8] In-Reply-To: References: Message-ID: > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- > 1024.00 | 510.41 | 1811.66 | 3.55 | 510.40 | 502.65 | 0.98 > 2048.00 | 293.52 | 984.37 | 3.35 | 304.96 | 177.88 | 0.58 > 1024.00 | 825.94 | 3387.64 | 4.10 | 750.77 | 1925.15 | 2.56 > 2048.00 | 411.91 | 1942.87 | 4.72 | 412.22 | 1034.13 | 2.51 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Review comments resolved. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/6c869c76..f7dec3d9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=06-07 Stats: 35 lines in 5 files changed: 8 ins; 22 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From coleenp at openjdk.java.net Thu Feb 24 14:03:49 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 24 Feb 2022 14:03:49 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v2] In-Reply-To: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: > Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. > Tested with tier1 on Oracle platforms. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Enhance comment to say why name needs to be set later. - 8282240: Add _name field to Method for NOT_PRODUCT only - Merge branch 'master' into method-name - Enhance comment to say why name needs to be set later. - 8282240: Add _name field to Method for NOT_PRODUCT only ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7608/files - new: https://git.openjdk.java.net/jdk/pull/7608/files/ea440441..ab762ed7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=00-01 Stats: 168915 lines in 3622 files changed: 116842 ins; 28264 del; 23809 mod Patch: https://git.openjdk.java.net/jdk/pull/7608.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7608/head:pull/7608 PR: https://git.openjdk.java.net/jdk/pull/7608 From jbhateja at openjdk.java.net Thu Feb 24 14:18:13 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 24 Feb 2022 14:18:13 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4] In-Reply-To: References: Message-ID: On Thu, 24 Feb 2022 02:43:46 GMT, Vamsi Parasa wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. > > Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix 32bit build issues src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4408: > 4406: jmp(done); > 4407: bind(neg_divisor_fastpath); > 4408: // Fastpath for divisor < 0: How about checking if divisor is +ve or -ve constant and non-constant dividend in identity routine and setting a flag in IR node, which can be used to either emit fast / slow path in a new instruction selection pattern. It will save emitting redundant instructions. src/hotspot/share/opto/divnode.cpp line 881: > 879: return (phase->type( in(2) )->higher_equal(TypeLong::ONE)) ? in(1) : this; > 880: } > 881: //------------------------------Value------------------------------------------ Ideal transform to replace unsigned divide by cheaper logical right shift instruction if divisor is POW will be useful. src/hotspot/share/opto/divnode.cpp line 897: > 895: > 896: // Either input is BOTTOM ==> the result is the local BOTTOM > 897: const Type *bot = bottom_type(); Can we add constant folding handling when both dividend and divisor are constants. ------------- PR: https://git.openjdk.java.net/jdk/pull/7572 From duke at openjdk.java.net Thu Feb 24 14:26:39 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Thu, 24 Feb 2022 14:26:39 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v4] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Introduce frame::link_or_null() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/5b7d6004..1cc247d7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=02-03 Stats: 33 lines in 8 files changed: 24 ins; 2 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Thu Feb 24 14:36:05 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Thu, 24 Feb 2022 14:36:05 GMT Subject: Integrated: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 In-Reply-To: References: Message-ID: On Mon, 21 Feb 2022 14:43:27 GMT, Johannes Bechberger wrote: > Fixes the mentioned bug by replacing the check in AsyncGetCallTrace using the newly introduced method `JavaThread::thread_from_jni_environment`. This pull request has now been integrated. Changeset: 231e48fa Author: Johannes Bechberger Committer: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/231e48fa63aeb4e35c7c948f958695d62b7157ce Stats: 9 lines in 2 files changed: 3 ins; 3 del; 3 mod 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422 Reviewed-by: dholmes, mdoerr, kevinw ------------- PR: https://git.openjdk.java.net/jdk/pull/7559 From duke at openjdk.java.net Thu Feb 24 14:43:58 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Thu, 24 Feb 2022 14:43:58 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v4] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Thu, 24 Feb 2022 14:26:39 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Introduce frame::link_or_null() I changed it again, introducing "frame::link_or_null()" that is the safe version of "frame::link()". > About a test, I agree, that would be nice. But one would have to "fake" an invalid stack. Maybe a new error reporting test where one deliberately overwrites portions of the stack and then tries to print the stack. However, I imagine things could be brittle, because the OS may catch a stack overwrite first. It's not totally trivial, maybe something for a separate RFE? I think tests would be nice but also quite difficult. A simple test would be to allocate a frame with zero values for all entries and check that `os::is_first_C_frame` returns true and that `frame::link_or_null()` returns also null. Then the same with a good frame (pointing to sensible values). ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Thu Feb 24 14:50:40 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Thu, 24 Feb 2022 14:50:40 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v5] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix compile warnings ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/1cc247d7..e91bfeef Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=03-04 Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From jbhateja at openjdk.java.net Thu Feb 24 14:56:03 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 24 Feb 2022 14:56:03 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v7] In-Reply-To: References: Message-ID: On Thu, 24 Feb 2022 00:43:27 GMT, Sandhya Viswanathan wrote: > Also curious, how does the performance look with all these changes. Updated new perf numbers. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From stuefe at openjdk.java.net Thu Feb 24 16:38:10 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 24 Feb 2022 16:38:10 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v5] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Thu, 24 Feb 2022 14:50:40 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix compile warnings Looks almost good now. Small remarks remain. src/hotspot/share/runtime/os.cpp line 1178: > 1176: > 1177: // Looks like all platforms can use the same function to check if C > 1178: // stack is walkable beyond current frame. This comment is somewhat weird and it - and the one at the prototype in os.hpp - could do with some massaging. Buts its fine to do this in a different RFE. src/hotspot/share/runtime/os.cpp line 1193: > 1191: > 1192: uintptr_t usp = (uintptr_t)fr->sp(); > 1193: if ((usp & sp_align_mask) != 0 || SafeFetchN(fr->sp(), (intptr_t)0) == 0) return true; I'd use os::is_readable_ptr instead for easier readibility. ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Thu Feb 24 16:43:14 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Thu, 24 Feb 2022 16:43:14 GMT Subject: Integrated: 8277204: Implement PAC-RET branch protection on Linux/AArch64 In-Reply-To: References: Message-ID: On Wed, 10 Nov 2021 12:32:53 GMT, Alan Hayward wrote: > PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One > of its uses is to protect against ROP based attacks. This is done by > signing the Link Register whenever it is stored on the stack, and > authenticating the value when it is loaded back from the stack. If an > attacker were to try to change control flow by editing the stack then > the authentication check of the Link Register will fail, causing a > segfault when the function returns. > > On a system with PAC enabled, it is expected that all applications will > be compiled with ROP protection. Fedora 33 and upwards already provide > this. By compiling for ARMv8.0, GCC and LLVM will only use the set of > PAC instructions that exist in the NOP space - on hardware without PAC, > these instructions act as NOPs, allowing backward compatibility for > negligible performance cost (2 NOPs per non-leaf function). > > Hardware is currently limited to the Apple M1 MacBooks. All testing has > been done within a Fedora Docker image. A run of SpecJVM showed no > difference to that of noise - which was surprising. > > The most important part of this patch is simply compiling using branch > protection provided by GCC/LLVM. This protects all C++ code from being > used in ROP attacks, removing all static ROP gadgets from use. > > The remainder of the patch adds ROP protection to runtime generated > code, in both stubs and compiled Java code. Attacks here are much harder > as ROP gadgets must be found dynamically at runtime. If/when AOT > compilation is added to JDK, then all stubs and compiled Java will be > susceptible ROP gadgets being found by static analysis and therefore > potentially as vulnerable as C++ code. > > There are a number of places where the VM changes control flow by > rewriting the stack or otherwise. I?ve done some analysis as to how > these could also be used for attacks (which I didn?t want to post here). > These areas can be protected ensuring the pointers to various stubs and > entry points are stored in memory as signed pointers. These changes are > simple to make (they can be reduced to a type change in common code and > a few addition sign/auth calls in the backend), but there a lot of them > and the total code change is fairly large. I?m happy to provide a few > work in progress patches. > > In order to match the security benefits of the Apple Arm64e ABI across > the whole of JDK, then all the changes mentioned above would be > required. This pull request has now been integrated. Changeset: 6fab8a2d Author: Alan Hayward Committer: Andrew Dinn URL: https://git.openjdk.java.net/jdk/commit/6fab8a2d6a97dbd2ffceca275716d020cb9f1eea Stats: 1481 lines in 35 files changed: 574 ins; 32 del; 875 mod 8277204: Implement PAC-RET branch protection on Linux/AArch64 Reviewed-by: erikj, ihse, adinn, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From adinn at openjdk.java.net Thu Feb 24 16:40:16 2022 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 24 Feb 2022 16:40:16 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v25] In-Reply-To: References: Message-ID: On Tue, 22 Feb 2022 14:35:19 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge master > - Merge master > - Merge master > - Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret > - Add comments to enter calls > - Set PreserveFramePointer if use_rop_protection is set > - Merge enter_subframe into enter > - Review fixups > - Documentation updates > - Update copyrights to 2022 > - ... and 24 more: https://git.openjdk.java.net/jdk/compare/022d8070...c4e0ee31 Yup this looks good to me. I will sponsor. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From duke at openjdk.java.net Thu Feb 24 19:08:08 2022 From: duke at openjdk.java.net (Vamsi Parasa) Date: Thu, 24 Feb 2022 19:08:08 GMT Subject: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4] In-Reply-To: References: Message-ID: On Thu, 24 Feb 2022 14:13:47 GMT, Jatin Bhateja wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32bit build issues > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4408: > >> 4406: jmp(done); >> 4407: bind(neg_divisor_fastpath); >> 4408: // Fastpath for divisor < 0: > > How about checking if divisor is +ve or -ve constant and non-constant dividend in identity routine and setting a flag in IR node, which can be used to either emit fast / slow path in a new instruction selection pattern. It will save emitting redundant instructions. Thanks for suggesting the enhancement. This enhancement will be implemented as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 > src/hotspot/share/opto/divnode.cpp line 881: > >> 879: return (phase->type( in(2) )->higher_equal(TypeLong::ONE)) ? in(1) : this; >> 880: } >> 881: //------------------------------Value------------------------------------------ > > Ideal transform to replace unsigned divide by cheaper logical right shift instruction if divisor is POW will be useful. Thanks for suggesting the enhancement. This enhancement will be implemented as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 > src/hotspot/share/opto/divnode.cpp line 897: > >> 895: >> 896: // Either input is BOTTOM ==> the result is the local BOTTOM >> 897: const Type *bot = bottom_type(); > > Can we add constant folding handling when both dividend and divisor are constants. Thanks for suggesting the enhancement. This enhancement will be implemented as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 ------------- PR: https://git.openjdk.java.net/jdk/pull/7572 From jbhateja at openjdk.java.net Fri Feb 25 06:22:42 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 25 Feb 2022 06:22:42 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v9] In-Reply-To: References: Message-ID: > Summary of changes: > - Intrinsify Math.round(float) and Math.round(double) APIs. > - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. > - Test creation using new IR testing framework. > > Following are the performance number of a JMH micro included with the patch > > Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) > > > Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio > -- | -- | -- | -- | -- | -- | -- | -- > FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | 510.36 | 548.39 | 1.07 > FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | 293.48 | 274.01 | 0.93 > FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Adding descriptive comments. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7094/files - new: https://git.openjdk.java.net/jdk/pull/7094/files/f7dec3d9..54d4ea36 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7094&range=07-08 Stats: 31 lines in 2 files changed: 14 ins; 0 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/7094.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7094/head:pull/7094 PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Fri Feb 25 07:30:05 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Fri, 25 Feb 2022 07:30:05 GMT Subject: RFR: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. [v4] In-Reply-To: References: Message-ID: <_H_syP7hDZ6iLNt8M8qOL48M2y6xdA28AR0gWzvt6Yw=.4e39ac63-dfe7-4554-b737-174491996544@github.com> On Tue, 22 Feb 2022 05:53:31 GMT, KIRIYAMA Takuya wrote: >> I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. >> >> For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below >> by using JfrJavaSupport::abort(). >> >> [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) >> [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... >> >> I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). >> I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core >> because there is no space on device. >> Could you please review the fix? > > KIRIYAMA Takuya has updated the pull request incrementally with one additional commit since the last revision: > > 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. I hope this change is integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From duke at openjdk.java.net Fri Feb 25 11:34:08 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Fri, 25 Feb 2022 11:34:08 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v5] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Thu, 24 Feb 2022 16:25:19 GMT, Thomas Stuefe wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compile warnings > > src/hotspot/share/runtime/os.cpp line 1178: > >> 1176: >> 1177: // Looks like all platforms can use the same function to check if C >> 1178: // stack is walkable beyond current frame. > > This comment is somewhat weird and it - and the one at the prototype in os.hpp - could do with some massaging. Buts its fine to do this in a different RFE. yes... ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Fri Feb 25 11:43:20 2022 From: duke at openjdk.java.net (KIRIYAMA Takuya) Date: Fri, 25 Feb 2022 11:43:20 GMT Subject: Integrated: 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. In-Reply-To: References: Message-ID: On Wed, 26 Jan 2022 06:41:41 GMT, KIRIYAMA Takuya wrote: > I think JFR should report an error message and jvm should shut down safely instead of gurantee failure. > > For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below > by using JfrJavaSupport::abort(). > > [0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp) > [0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM... > > I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort(). > I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core > because there is no space on device. > Could you please review the fix? This pull request has now been integrated. Changeset: 9471f24c Author: KIRIYAMA Takuya Committer: Markus Gr?nlund URL: https://git.openjdk.java.net/jdk/commit/9471f24ca191832669a13e5a1ea73f7097a25927 Stats: 16 lines in 3 files changed: 8 ins; 2 del; 6 mod 8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. Reviewed-by: mgronlun ------------- PR: https://git.openjdk.java.net/jdk/pull/7227 From stuefe at openjdk.java.net Fri Feb 25 12:30:00 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 25 Feb 2022 12:30:00 GMT Subject: Integrated: JDK-8281015: Further simplify NMT backend In-Reply-To: References: Message-ID: On Mon, 31 Jan 2022 08:12:02 GMT, Thomas Stuefe wrote: > NMT backend can be further simplified and cleaned out. > > - some entry points require NMT_TrackingLevel as arguments, some use the global tracking level. Ultimately, every part of NMT always uses the global tracking level, so in many cases the explicit parameter can be removed and the global tracking level can be used instead. > - `MemTracker::malloc_header_size(level)` + `MemTracker::malloc_footer_size(level)` are fused into `MemTracker::overhead_per_malloc()` > - when adding to `MallocSiteTable`, caller gets back a shortcut to the entry. That shortcut is stored verbatim in the malloc header. It consists of two 16-bit values (bucket index and chain position). That tupel finds its way into many argument lists. It can be simplified into single 32-bit opaque marker. Code outside the MallocSiteTable does not need to know what it is. > - Currently, the `MallocHeader` class contains a lot of logic. It accounts (in constructor) and de-accounts (in `MallocHeader::release()`). It would simplify code if `MallocHeader` were just a dumb data carrier and the `MallocTracker` would do the actual work. > - `MallocHeader` can be simplified, almost all members made constant and modifying accessors removed. > - In some places we handle inputptr=NULL gracefully where we should assert instead > - Expressions like `MemTracker::tracking_level() != NMT_off` can be simplified to `MemTracker::enabled()`. > - MemTracker::malloc_base (all variants) can be removed. Note that we have MallocTracker::malloc_header, which achieves the same and does not require casting to the header. > > Testing: > > - GHAs > - manually ran NMT gtests (all NMT modes) and NMT jtreg tests on Ubuntu x64 > - SAP nightlies ran through. Note that since 8275301 "Unify C-heap buffer overrun checks into NMT" NMT is enabled by default in debug builds, so it gets a lot more workout in tests now. > > Note that I wanted to manually verify that the gdb "call pp" command still works in order to not break Zhengyu's recent addition, but found its already broken. I filed https://bugs.openjdk.java.net/browse/JDK-8281023 and am preparing a separate patch. This pull request has now been integrated. Changeset: b96b7437 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/b96b743727a628c1b33cc9b3374f010c2ea30b78 Stats: 273 lines in 10 files changed: 56 ins; 147 del; 70 mod 8281015: Further simplify NMT backend Reviewed-by: zgu, mbaesken ------------- PR: https://git.openjdk.java.net/jdk/pull/7283 From duke at openjdk.java.net Fri Feb 25 12:31:30 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Fri, 25 Feb 2022 12:31:30 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v6] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Simple test - Use os::is_readable_pointer ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/e91bfeef..2d29a6db Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=04-05 Stats: 75 lines in 5 files changed: 63 ins; 3 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Fri Feb 25 12:35:38 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Fri, 25 Feb 2022 12:35:38 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v7] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: <888gPfj0r4fbAmhIWUIqbv91hS1la5fK8h-9lPodA1E=.dd2071e3-a0fd-4ca3-84c4-b7bfddd49c51@github.com> > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix trailing whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/2d29a6db..7ee0c0b8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Fri Feb 25 12:41:30 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Fri, 25 Feb 2022 12:41:30 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v8] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Correct mistake ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/7ee0c0b8..de36fd68 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=06-07 Stats: 34 lines in 1 file changed: 0 ins; 33 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Fri Feb 25 13:02:37 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Fri, 25 Feb 2022 13:02:37 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/de36fd68..1f08203f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=07-08 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From dholmes at openjdk.java.net Fri Feb 25 13:12:04 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 25 Feb 2022 13:12:04 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v25] In-Reply-To: <0Ob4kezo_Q0ro0eF_OeEABrzYeZCNmoaD5KQUcBpZRc=.6c772f45-c70c-4983-880a-8878e281d04b@github.com> References: <0Ob4kezo_Q0ro0eF_OeEABrzYeZCNmoaD5KQUcBpZRc=.6c772f45-c70c-4983-880a-8878e281d04b@github.com> Message-ID: On Thu, 24 Feb 2022 10:52:00 GMT, Alan Hayward wrote: >> Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge master >> - Merge master >> - Merge master >> - Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret >> - Add comments to enter calls >> - Set PreserveFramePointer if use_rop_protection is set >> - Merge enter_subframe into enter >> - Review fixups >> - Documentation updates >> - Update copyrights to 2022 >> - ... and 24 more: https://git.openjdk.java.net/jdk/compare/022d8070...c4e0ee31 > > Any more comments? Otherwise I'll integrate later @a74nh this seems to have broken the Zero build: src/hotspot/share/gc/shared/barrierSetNMethod.cpp:58:33: error: 'pauth_strip_pointer' was not declared in this scope 58 | AARCH64_ONLY(return_address = pauth_strip_pointer(return_address)); I'm guessing a missing include file. ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From hseigel at openjdk.java.net Fri Feb 25 14:24:00 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 25 Feb 2022 14:24:00 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values [v2] In-Reply-To: References: Message-ID: > Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. > > Thanks, Harold Harold Seigel has updated the pull request incrementally with one additional commit since the last revision: add gtest, fix TestParallelGCThreads.java, and revise implementation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7522/files - new: https://git.openjdk.java.net/jdk/pull/7522/files/354e3f5c..e8de1741 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7522&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7522&range=00-01 Stats: 262 lines in 5 files changed: 111 ins; 133 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/7522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7522/head:pull/7522 PR: https://git.openjdk.java.net/jdk/pull/7522 From hseigel at openjdk.java.net Fri Feb 25 14:24:01 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 25 Feb 2022 14:24:01 GMT Subject: RFR: 8281472: JVM options processing silently truncates large illegal options values In-Reply-To: References: Message-ID: On Thu, 17 Feb 2022 19:09:26 GMT, Harold Seigel wrote: > Please review this change to fix JDK-8281472. The fix prevents truncation of large illegal option values by rejecting those values if they exceed the range of their type. For example, it rejects values of int options that are not between max_int and min_int. > > The fix was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-5 on Linux-x64 and Windows-x64. > > Thanks, Harold This new commit replaces the JTReg test with a gtest as suggested by David, has a revised implementatin suggested by Ioi, and fixes TestParallelGCThreads.java. ------------- PR: https://git.openjdk.java.net/jdk/pull/7522 From coleenp at openjdk.java.net Fri Feb 25 15:01:53 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 25 Feb 2022 15:01:53 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v6] In-Reply-To: References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: On Wed, 23 Feb 2022 04:15:28 GMT, Ioi Lam wrote: >> **Background:** >> >> In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: >> >> >> public enum Day { SUNDAY, MONDAY ... } >> >> >> to >> >> >> public class Day extends java.lang.Enum { >> public static final SUNDAY = new Day("SUNDAY"); >> public static final MONDAY = new Day("MONDAY"); ... >> } >> >> >> With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) >> >> **Fix:** >> >> During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. >> >> This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. >> >> **Verification:** >> >> To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt >> >> **Testing:** >> >> Passed Oracle CI tiers 1-4. WIll run tier 5 as well. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed whitespace Sorry for the long delay. It's a big change, but a lot in debug so that's ok. Looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6653 From coleenp at openjdk.java.net Fri Feb 25 15:01:54 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 25 Feb 2022 15:01:54 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v3] In-Reply-To: <4CLwCQdc_haGT_ueBQGZKzJVasGK26B6iYcO7VtOfAs=.02f3deb9-7ac7-45fd-9a7c-37b0fe4a8ea2@github.com> References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> <7c6mh2-s3SkpfGG1WptyZsJjTfcDy1wX0Ll0713MLkU=.7df74a01-7ea5-49c1-9bda-f73798df3852@github.com> <4CLwCQdc_haGT_ueBQGZKzJVasGK26B6iYcO7VtOfAs=.02f3deb9-7ac7-45fd-9a7c-37b0fe4a8ea2@github.com> Message-ID: <1B2f3fl1vAMMiwyVKdf5rmn_kmJFhYxXFg71WAkILbw=.22926dc1-4234-4f21-98ee-64b5372c00c6@github.com> On Wed, 19 Jan 2022 05:44:10 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/heapShared.cpp line 433: >> >>> 431: oop mirror = k->java_mirror(); >>> 432: int i = 0; >>> 433: for (JavaFieldStream fs(k); !fs.done(); fs.next()) { >> >> This seems like it should also use InstanceKlass::do_local_static_fields. > > Converting this to InstanceKlass::do_nonstatic_fields() is difficult because the loop body references 7 different variables declared outside of the loop. > > One thing I tried is to add a new version of do_nonstatic_fields2() that supports C++ lambdas. You can see my experiment from here: > > https://github.com/openjdk/jdk/compare/master...iklam:lambda-for-instanceklass-do_local_static_fields2?expand=1 > > I changed all my new code to use the do_nonstatic_fields2() function with lambda. Ok, if it requires lambdas and additional change, never mind then. ------------- PR: https://git.openjdk.java.net/jdk/pull/6653 From duke at openjdk.java.net Fri Feb 25 15:20:04 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Fri, 25 Feb 2022 15:20:04 GMT Subject: RFR: 8277204: Implement PAC-RET branch protection on Linux/AArch64 [v25] In-Reply-To: References: Message-ID: On Tue, 22 Feb 2022 14:35:19 GMT, Alan Hayward wrote: >> PAC is an optional feature in AArch64 8.3 and is compulsory in v9. One >> of its uses is to protect against ROP based attacks. This is done by >> signing the Link Register whenever it is stored on the stack, and >> authenticating the value when it is loaded back from the stack. If an >> attacker were to try to change control flow by editing the stack then >> the authentication check of the Link Register will fail, causing a >> segfault when the function returns. >> >> On a system with PAC enabled, it is expected that all applications will >> be compiled with ROP protection. Fedora 33 and upwards already provide >> this. By compiling for ARMv8.0, GCC and LLVM will only use the set of >> PAC instructions that exist in the NOP space - on hardware without PAC, >> these instructions act as NOPs, allowing backward compatibility for >> negligible performance cost (2 NOPs per non-leaf function). >> >> Hardware is currently limited to the Apple M1 MacBooks. All testing has >> been done within a Fedora Docker image. A run of SpecJVM showed no >> difference to that of noise - which was surprising. >> >> The most important part of this patch is simply compiling using branch >> protection provided by GCC/LLVM. This protects all C++ code from being >> used in ROP attacks, removing all static ROP gadgets from use. >> >> The remainder of the patch adds ROP protection to runtime generated >> code, in both stubs and compiled Java code. Attacks here are much harder >> as ROP gadgets must be found dynamically at runtime. If/when AOT >> compilation is added to JDK, then all stubs and compiled Java will be >> susceptible ROP gadgets being found by static analysis and therefore >> potentially as vulnerable as C++ code. >> >> There are a number of places where the VM changes control flow by >> rewriting the stack or otherwise. I?ve done some analysis as to how >> these could also be used for attacks (which I didn?t want to post here). >> These areas can be protected ensuring the pointers to various stubs and >> entry points are stored in memory as signed pointers. These changes are >> simple to make (they can be reduced to a type change in common code and >> a few addition sign/auth calls in the backend), but there a lot of them >> and the total code change is fairly large. I?m happy to provide a few >> work in progress patches. >> >> In order to match the security benefits of the Apple Arm64e ABI across >> the whole of JDK, then all the changes mentioned above would be >> required. > > Alan Hayward has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge master > - Merge master > - Merge master > - Error on -XX:-PreserveFramePointer -XX:UseBranchProtection=pac-ret > - Add comments to enter calls > - Set PreserveFramePointer if use_rop_protection is set > - Merge enter_subframe into enter > - Review fixups > - Documentation updates > - Update copyrights to 2022 > - ... and 24 more: https://git.openjdk.java.net/jdk/compare/022d8070...c4e0ee31 Yes, we spotted this today too. https://bugs.openjdk.java.net/browse/JDK-8282392 My initial thought was that I needed to add a pauth header file with stub functions to linux_zero/. Which does feel a little awkward. AARCH64_PORT_ONLY does sound like a better option. Thankfully there is no need for full pac support in zero too.... :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6334 From pchilanomate at openjdk.java.net Fri Feb 25 17:00:07 2022 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Fri, 25 Feb 2022 17:00:07 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v2] In-Reply-To: References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Thu, 24 Feb 2022 14:03:49 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Enhance comment to say why name needs to be set later. > - 8282240: Add _name field to Method for NOT_PRODUCT only > - Merge branch 'master' into method-name > - Enhance comment to say why name needs to be set later. > - 8282240: Add _name field to Method for NOT_PRODUCT only Hi Coleen, Looks good to me. I see we also call set_constants() and set_name_index() from VM_RedefineClasses::set_new_constant_pool(), do we need to set the name in there too or it's not needed? Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7608 From hseigel at openjdk.java.net Fri Feb 25 18:18:10 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 25 Feb 2022 18:18:10 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v2] In-Reply-To: References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Thu, 24 Feb 2022 14:03:49 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Enhance comment to say why name needs to be set later. > - 8282240: Add _name field to Method for NOT_PRODUCT only > - Merge branch 'master' into method-name > - Enhance comment to say why name needs to be set later. > - 8282240: Add _name field to Method for NOT_PRODUCT only Changes look good! Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7608 From coleenp at openjdk.java.net Fri Feb 25 20:19:26 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 25 Feb 2022 20:19:26 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v3] In-Reply-To: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: > Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. > Tested with tier1 on Oracle platforms. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Set _name field in constructor with available symbol rather than later when constant pool pointer is set. I like this one better. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7608/files - new: https://git.openjdk.java.net/jdk/pull/7608/files/ab762ed7..f75cf1e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=01-02 Stats: 43 lines in 5 files changed: 28 ins; 7 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/7608.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7608/head:pull/7608 PR: https://git.openjdk.java.net/jdk/pull/7608 From coleenp at openjdk.java.net Fri Feb 25 20:19:32 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 25 Feb 2022 20:19:32 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v2] In-Reply-To: References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Thu, 24 Feb 2022 14:03:49 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Enhance comment to say why name needs to be set later. > - 8282240: Add _name field to Method for NOT_PRODUCT only > - Merge branch 'master' into method-name > - Enhance comment to say why name needs to be set later. > - 8282240: Add _name field to Method for NOT_PRODUCT only Hi Patricio, In the RedefineClasses case, the methods already have the _name field set. But your comment pointed out the fragility of this change, so I changed it. Hope this one is better. Please re-review. Harold, can you re-review also? ------------- PR: https://git.openjdk.java.net/jdk/pull/7608 From hseigel at openjdk.java.net Fri Feb 25 20:57:55 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 25 Feb 2022 20:57:55 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v3] In-Reply-To: References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Fri, 25 Feb 2022 20:19:26 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Set _name field in constructor with available symbol rather than later when constant pool pointer is set. I like this one better. Still looks good! Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7608 From pchilanomate at openjdk.java.net Fri Feb 25 21:37:54 2022 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Fri, 25 Feb 2022 21:37:54 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v3] In-Reply-To: References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Fri, 25 Feb 2022 20:19:26 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Set _name field in constructor with available symbol rather than later when constant pool pointer is set. I like this one better. Still good! Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7608 From sviswanathan at openjdk.java.net Sat Feb 26 01:33:54 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Sat, 26 Feb 2022 01:33:54 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v9] In-Reply-To: References: Message-ID: On Fri, 25 Feb 2022 06:22:42 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8279508: Adding descriptive comments. Other than this the patch looks good to me. What testing have you done? src/hotspot/cpu/x86/x86.ad line 7263: > 7261: __ vector_round_float_avx($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister, > 7262: $xtmp2$$XMMRegister, $xtmp3$$XMMRegister, $xtmp4$$XMMRegister, > 7263: ExternalAddress(vector_float_signflip()), new_mxcsr, $scratch$$Register, vlen_enc); The vector_float_signflip() here should be replaced by vector_all_bits_set(). cvtps2dq description: If a converted result cannot be represented in the destination format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (2w-1, where w represents the number of bits in the destination format) is returned. src/hotspot/cpu/x86/x86.ad line 7280: > 7278: __ vector_round_float_evex($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister, > 7279: $xtmp2$$XMMRegister, $ktmp1$$KRegister, $ktmp2$$KRegister, > 7280: ExternalAddress(vector_float_signflip()), new_mxcsr, $scratch$$Register, vlen_enc); The vector_float_signflip() here should be replaced by vector_all_bits_set(). src/hotspot/cpu/x86/x86.ad line 7295: > 7293: __ vector_round_double_evex($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister, > 7294: $xtmp2$$XMMRegister, $ktmp1$$KRegister, $ktmp2$$KRegister, > 7295: ExternalAddress(vector_double_signflip()), new_mxcsr, $scratch$$Register, vlen_enc); The vector_double_signflip() here should be replaced by vector_all_bits_set(). vcvtpd2qq description: If a converted result cannot be represented in the destination format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (2w-1, where w represents the number of bits in the destination format) is returned. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From sviswanathan at openjdk.java.net Sat Feb 26 03:05:54 2022 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Sat, 26 Feb 2022 03:05:54 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v9] In-Reply-To: References: Message-ID: <1K0c0y8K8bVNJEFMyTQSxwdgJlx9E2N8uhHC7O9sfyM=.c4ead8b5-abe0-42f4-ae10-aa24425eb75d@github.com> On Sat, 26 Feb 2022 01:06:21 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8279508: Adding descriptive comments. > > src/hotspot/cpu/x86/x86.ad line 7263: > >> 7261: __ vector_round_float_avx($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister, >> 7262: $xtmp2$$XMMRegister, $xtmp3$$XMMRegister, $xtmp4$$XMMRegister, >> 7263: ExternalAddress(vector_float_signflip()), new_mxcsr, $scratch$$Register, vlen_enc); > > The vector_float_signflip() here should be replaced by vector_all_bits_set(). > cvtps2dq description: > If a converted result cannot be represented in the destination > format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value > (2w-1, where w represents the number of bits in the destination format) is returned. Clarification, the number in my comments above is (2^w - 1). This is from Intel SDM (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html). Also you will need to take care when the valid unoverflowed result is -1 i.e. 0xFFFFFFFF (2^32 - 1). ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Sat Feb 26 03:42:02 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sat, 26 Feb 2022 03:42:02 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v9] In-Reply-To: <8mhsd-DL1IccFiqrRigKdck8OJg79sjKgaYXrHc4zwY=.c92cb7f5-8e54-42ab-84f1-9cfa1ce76779@github.com> References: <1K0c0y8K8bVNJEFMyTQSxwdgJlx9E2N8uhHC7O9sfyM=.c4ead8b5-abe0-42f4-ae10-aa24425eb75d@github.com> <8mhsd-DL1IccFiqrRigKdck8OJg79sjKgaYXrHc4zwY=.c92cb7f5-8e54-42ab-84f1-9cfa1ce76779@github.com> Message-ID: On Sat, 26 Feb 2022 03:37:32 GMT, Quan Anh Mai wrote: >> Clarification, the number in my comments above is (2^w - 1). This is from Intel SDM (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html). >> Also you will need to take care when the valid unoverflowed result is -1 i.e. 0xFFFFFFFF (2^32 - 1). > > I believe the indefinite value should be 2^(w - 1) (a.k.a 0x80000000) and the documentation is typoed. If you look at `cvtss2si`, the indefinite value is also written as 2^w - 1 but yet in `MacroAssembler::convert_f2i` we compare it with 0x80000000. In addition, choosing -1 as an indefinite value is weird enough and to complicate it as 2^w - 1 is really unusual. `MacroAssembler::convert_f2i` https://github.com/openjdk/jdk/blob/c5c6058fd57d4b594012035eaf18a57257f4ad85/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8919 ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From duke at openjdk.java.net Sat Feb 26 03:42:02 2022 From: duke at openjdk.java.net (Quan Anh Mai) Date: Sat, 26 Feb 2022 03:42:02 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v9] In-Reply-To: <1K0c0y8K8bVNJEFMyTQSxwdgJlx9E2N8uhHC7O9sfyM=.c4ead8b5-abe0-42f4-ae10-aa24425eb75d@github.com> References: <1K0c0y8K8bVNJEFMyTQSxwdgJlx9E2N8uhHC7O9sfyM=.c4ead8b5-abe0-42f4-ae10-aa24425eb75d@github.com> Message-ID: <8mhsd-DL1IccFiqrRigKdck8OJg79sjKgaYXrHc4zwY=.c92cb7f5-8e54-42ab-84f1-9cfa1ce76779@github.com> On Sat, 26 Feb 2022 03:02:51 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 7263: >> >>> 7261: __ vector_round_float_avx($dst$$XMMRegister, $src$$XMMRegister, $xtmp1$$XMMRegister, >>> 7262: $xtmp2$$XMMRegister, $xtmp3$$XMMRegister, $xtmp4$$XMMRegister, >>> 7263: ExternalAddress(vector_float_signflip()), new_mxcsr, $scratch$$Register, vlen_enc); >> >> The vector_float_signflip() here should be replaced by vector_all_bits_set(). >> cvtps2dq description: >> If a converted result cannot be represented in the destination >> format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value >> (2w-1, where w represents the number of bits in the destination format) is returned. > > Clarification, the number in my comments above is (2^w - 1). This is from Intel SDM (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html). > Also you will need to take care when the valid unoverflowed result is -1 i.e. 0xFFFFFFFF (2^32 - 1). I believe the indefinite value should be 2^(w - 1) (a.k.a 0x80000000) and the documentation is typoed. If you look at `cvtss2si`, the indefinite value is also written as 2^w - 1 but yet in `MacroAssembler::convert_f2i` we compare it with 0x80000000. In addition, choosing -1 as an indefinite value is weird enough and to complicate it as 2^w - 1 is really unusual. ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From jbhateja at openjdk.java.net Sat Feb 26 04:57:55 2022 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Sat, 26 Feb 2022 04:57:55 GMT Subject: RFR: 8279508: Auto-vectorize Math.round API [v9] In-Reply-To: References: Message-ID: On Fri, 25 Feb 2022 06:22:42 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8279508: Adding descriptive comments. As per SDM, if post conversion a floating point number is non-representable in destination format e.g. a floating point value 3.4028235E10 post integer conversion will overflow the value range of integer primitive type, hence a -0.0 value or 0x80000000 is returned here. Similarly for +/- NaN and +/-Inf post conversion value returns is -0.0. All these cases i.e. post conversion non-representable floating point values and NaN/Inf values are handled in a special manner where algorithm first performs an unordered comparison b/w original source value and returns a 0 in case of NaN, this weeds out the NaN case and for rest of the special values we check the MSB bit of the source and either return an Integer.MAX_VALUE for +ve numbers or a Integer.MIN_VALUE to adhere to the semantics of Math.round API. Existing tests were enhanced to cover various special cases (NaN/Inf/+ve/-ve value/values which may be inexact after adding 0.5/ values which post conversion overflow integer value range). ------------- PR: https://git.openjdk.java.net/jdk/pull/7094 From stuefe at openjdk.java.net Sat Feb 26 06:18:56 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 26 Feb 2022 06:18:56 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Fri, 25 Feb 2022 13:02:37 GMT, Johannes Bechberger wrote: >> This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method >> and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests Hi Johannes, Getting closer. More remarks inline. Cheers, Thomas src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp line 154: > 152: > 153: inline intptr_t* frame::link_or_null() const { > 154: auto ptr = (intptr_t **)addr_at(link_offset); Please don't use auto. In general, use features and style that is adapted around you, and beyond that pls refer to the C++ style guide. When in Rome... src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp line 155: > 153: inline intptr_t* frame::link_or_null() const { > 154: auto ptr = (intptr_t **)addr_at(link_offset); > 155: if (os::is_readable_pointer((const void*)ptr)) { You don't need this cast src/hotspot/cpu/aarch64/frame_aarch64.inline.hpp line 159: > 157: } > 158: return NULL; > 159: } You could shorten these four lines to a single one using `?`, especially since this code is duplicated across platforms. src/hotspot/share/runtime/os.cpp line 1179: > 1177: // Looks like all platforms can use the same function to check if C > 1178: // stack is walkable beyond current frame. > 1179: // Returns false if this is the cas typo src/hotspot/share/runtime/os.cpp line 1184: > 1182: #ifdef _WINDOWS > 1183: return true; // native stack isn't walkable on windows this way. > 1184: #else This change has nothing to do with the bug. I would leave this as it is and let the code below at least compile on windows. Then we know it does not bitrot there. I am also not clear why this would not work on windows, since we could optionally build with framepointers enabled, right? And don't we have frame pointers on 32-bit windows always? I may remember this wrong. src/hotspot/share/runtime/os.cpp line 1193: > 1191: > 1192: uintptr_t usp = (uintptr_t)fr->sp(); > 1193: if ((usp & sp_align_mask) != 0 || !os::is_readable_pointer((const void*)usp)) return true; remove cast test/hotspot/gtest/runtime/test_os.cpp line 874: > 872: frame invalid_frame; > 873: EXPECT_TRUE(os::is_first_C_frame(&invalid_frame)); // the frame has zeroes for all values > 874: Please add a test with valid looking but garbage pointers, to test that your safefetch really works. We usually do this by reserving + protecting a stripe of memory and using that one as guaranteed faulting pointer. test/hotspot/gtest/runtime/test_os.cpp line 875: > 873: EXPECT_TRUE(os::is_first_C_frame(&invalid_frame)); // the frame has zeroes for all values > 874: > 875: auto cur_frame = os::current_frame(); // this frame has to have a sender please use a type here, not auto test/hotspot/gtest/runtime/test_os.cpp line 878: > 876: EXPECT_FALSE(os::is_first_C_frame(&cur_frame)); > 877: #endif // _WIN32 > 878: } missing newline ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7591 From stuefe at openjdk.java.net Sat Feb 26 06:18:56 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 26 Feb 2022 06:18:56 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Sat, 26 Feb 2022 05:54:05 GMT, Thomas Stuefe wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/share/runtime/os.cpp line 1193: > >> 1191: >> 1192: uintptr_t usp = (uintptr_t)fr->sp(); >> 1193: if ((usp & sp_align_mask) != 0 || !os::is_readable_pointer((const void*)usp)) return true; > > remove cast Also, could you factor out this test to a local helper, something like: static bool pointer_is_bad(uintptr_t p) { ... } ? ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From stuefe at openjdk.java.net Sat Feb 26 06:18:57 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 26 Feb 2022 06:18:57 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Sat, 26 Feb 2022 05:57:06 GMT, Thomas Stuefe wrote: >> src/hotspot/share/runtime/os.cpp line 1193: >> >>> 1191: >>> 1192: uintptr_t usp = (uintptr_t)fr->sp(); >>> 1193: if ((usp & sp_align_mask) != 0 || !os::is_readable_pointer((const void*)usp)) return true; >> >> remove cast > > Also, could you factor out this test to a local helper, something like: > > static bool pointer_is_bad(uintptr_t p) { > ... > } > > ? And the alignment check would be more readable with the is_aligned() function from align.hpp (this is old code, the function did not exist back then). ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Sat Feb 26 07:57:54 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Sat, 26 Feb 2022 07:57:54 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Sat, 26 Feb 2022 06:02:45 GMT, Thomas Stuefe wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests > > src/hotspot/share/runtime/os.cpp line 1184: > >> 1182: #ifdef _WINDOWS >> 1183: return true; // native stack isn't walkable on windows this way. >> 1184: #else > > This change has nothing to do with the bug. > > I would leave this as it is and let the code below at least compile on windows. Then we know it does not bitrot there. I am also not clear why this would not work on windows, since we could optionally build with framepointers enabled, right? And don't we have frame pointers on 32-bit windows always? I may remember this wrong. There is a special function on windows to obtain native stack traces, it uses an OS function > test/hotspot/gtest/runtime/test_os.cpp line 874: > >> 872: frame invalid_frame; >> 873: EXPECT_TRUE(os::is_first_C_frame(&invalid_frame)); // the frame has zeroes for all values >> 874: > > Please add a test with valid looking but garbage pointers, to test that your safefetch really works. We usually do this by reserving + protecting a stripe of memory and using that one as guaranteed faulting pointer. I thought about, but was unsure how to it properly. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From stuefe at openjdk.java.net Sat Feb 26 08:23:54 2022 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 26 Feb 2022 08:23:54 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: On Sat, 26 Feb 2022 07:55:07 GMT, Johannes Bechberger wrote: >> test/hotspot/gtest/runtime/test_os.cpp line 874: >> >>> 872: frame invalid_frame; >>> 873: EXPECT_TRUE(os::is_first_C_frame(&invalid_frame)); // the frame has zeroes for all values >>> 874: >> >> Please add a test with valid looking but garbage pointers, to test that your safefetch really works. We usually do this by reserving + protecting a stripe of memory and using that one as guaranteed faulting pointer. > > I thought about, but was unsure how to it properly. No problem, it's probably enough in its current form. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Sat Feb 26 09:53:53 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Sat, 26 Feb 2022 09:53:53 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: <3IE97Ur28wo8YNWudqJKhQxDv5iO8cpUGneR-bsFR5s=.dda6f316-2ce1-4f89-b254-47019781ab6d@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> <3IE97Ur28wo8YNWudqJKhQxDv5iO8cpUGneR-bsFR5s=.dda6f316-2ce1-4f89-b254-47019781ab6d@github.com> Message-ID: On Sat, 26 Feb 2022 09:49:22 GMT, Johannes Bechberger wrote: >> No problem, it's probably enough in its current form. > > Ok, but it would be cool if you could tell me how to do it, because I have the suspicion, that this not the only PR that I will ever write regarding segfaults. But I have to wait till Monday to let someone write an issue :) ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From duke at openjdk.java.net Sat Feb 26 09:53:53 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Sat, 26 Feb 2022 09:53:53 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v9] In-Reply-To: References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: <3IE97Ur28wo8YNWudqJKhQxDv5iO8cpUGneR-bsFR5s=.dda6f316-2ce1-4f89-b254-47019781ab6d@github.com> On Sat, 26 Feb 2022 08:20:26 GMT, Thomas Stuefe wrote: >> I thought about, but was unsure how to it properly. > > No problem, it's probably enough in its current form. Ok, but it would be cool if you could tell me how to do it, because I have the suspicion, that this not the only PR that I will ever write regarding segfaults. ------------- PR: https://git.openjdk.java.net/jdk/pull/7591 From coleenp at openjdk.java.net Sat Feb 26 13:12:29 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sat, 26 Feb 2022 13:12:29 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v4] In-Reply-To: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> > Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. > Tested with tier1 on Oracle platforms. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix CDS ommission. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7608/files - new: https://git.openjdk.java.net/jdk/pull/7608/files/f75cf1e2..6b55334a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7608&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7608.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7608/head:pull/7608 PR: https://git.openjdk.java.net/jdk/pull/7608 From coleenp at openjdk.java.net Sat Feb 26 13:12:30 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sat, 26 Feb 2022 13:12:30 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v3] In-Reply-To: References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Fri, 25 Feb 2022 20:19:26 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Set _name field in constructor with available symbol rather than later when constant pool pointer is set. I like this one better. Thanks Patricio and Harold. With this fix (needed to walk new field for CDS), the new test passes on all platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/7608 From coleenp at openjdk.java.net Sat Feb 26 13:23:13 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Sat, 26 Feb 2022 13:23:13 GMT Subject: RFR: 8279573: compiler/codecache/CodeCacheFullCountTest.java fails with "RuntimeException: the value of full_count is wrong." Message-ID: <9kpGtp-T1jcm8LYcqrFjUB_VDRth_YnpgdLrarSonSQ=.66e97845-c0a9-4e82-b3e9-464cdffb2c72@github.com> This change adds a conditional to make -XX:-UseCodeCacheFlushing not flush the code cache so that the test passes on loom. It also makes full_count atomic so that the test in codeCache for printing is correct. This change also fixes the test because the full_count field and the message printing are not synchronized, so you can get 2 or more depending on the number of compiler threads. Tested with tier1-3 on linux and windows x64. ------------- Commit messages: - 8279573: compiler/codecache/CodeCacheFullCountTest.java fails with "RuntimeException: the value of full_count is wrong." Changes: https://git.openjdk.java.net/jdk/pull/7629/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7629&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8279573 Stats: 12 lines in 4 files changed: 3 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/7629.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7629/head:pull/7629 PR: https://git.openjdk.java.net/jdk/pull/7629 From dholmes at openjdk.java.net Mon Feb 28 02:06:46 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 28 Feb 2022 02:06:46 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Wed, 23 Feb 2022 11:20:11 GMT, Aleksey Shipilev wrote: >> Looks fine. There might be some performance implications to this, as IIRC this code gets called from GC copying, so some light benchmarking might be in order. > >> @shipilev any suggestions as to which benchmarks to try to run for this? Otherwise I'll just try our usual internal ones. > > Just the usual sanity check of benchmarks is fine. If there are regressions on some other benchmarks, we can take care of them after integration. Paging @shipilev - please see previous comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From iklam at openjdk.java.net Mon Feb 28 06:34:14 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 28 Feb 2022 06:34:14 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v7] In-Reply-To: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: > **Background:** > > In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: > > > public enum Day { SUNDAY, MONDAY ... } > > > to > > > public class Day extends java.lang.Enum { > public static final SUNDAY = new Day("SUNDAY"); > public static final MONDAY = new Day("MONDAY"); ... > } > > > With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) > > **Fix:** > > During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. > > This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. > > **Verification:** > > To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt > > **Testing:** > > Passed Oracle CI tiers 1-4. WIll run tier 5 as well. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - fixed copyright year - Merge branch 'master' into 8275731-heapshared-enum - fixed whitespace - Fixed comments per @calvinccheung review - Merge branch 'master' into 8275731-heapshared-enum - Use InstanceKlass::do_local_static_fields for some field iterations - Merge branch 'master' into 8275731-heapshared-enum - added exclusions needed by "java -Xshare:dump -ea -esa" - Comments from @calvinccheung off-line - 8275731: CDS archived enums objects are recreated at runtime ------------- Changes: https://git.openjdk.java.net/jdk/pull/6653/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6653&range=06 Stats: 860 lines in 16 files changed: 807 ins; 4 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/6653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6653/head:pull/6653 PR: https://git.openjdk.java.net/jdk/pull/6653 From duke at openjdk.java.net Mon Feb 28 12:36:59 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 28 Feb 2022 12:36:59 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 Message-ID: 8282392: [zero] Build broken on AArch64 ------------- Commit messages: - 8282392: [zero] Build broken on AArch64 Changes: https://git.openjdk.java.net/jdk/pull/7633/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7633&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282392 Stats: 13 lines in 5 files changed: 8 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/7633.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7633/head:pull/7633 PR: https://git.openjdk.java.net/jdk/pull/7633 From shade at openjdk.java.net Mon Feb 28 12:52:44 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 28 Feb 2022 12:52:44 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: On Thu, 24 Feb 2022 11:45:17 GMT, David Holmes wrote: > I ran some GC benchmarks which turned out to be just specjbb2005 and specjvm2008-*. > > There were two regressions flagged: > > Linux-x64: SPECjvm2008-LU.large-ZGC -5.82% > macos-x64: SPECjvm2008-Serial-ParGC -4.16% Myself, I never trust LU.large results, since they experience quite large run-to-run variance in our runs. Serial regression is weird, though, it is usually a very stable workload. Does it reproduce locally? If it does not reproduce, we can go ahead and deal with any regressions later. ------------- PR: https://git.openjdk.java.net/jdk/pull/7567 From aph at openjdk.java.net Mon Feb 28 14:26:46 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 28 Feb 2022 14:26:46 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 12:28:39 GMT, Alan Hayward wrote: > 8282392: [zero] Build broken on AArch64 src/hotspot/share/utilities/macros.hpp line 543: > 541: #define AARCH64_PORT_ONLY(code) code > 542: #define NOT_AARCH64_PORT_ONLY(code) > 543: #else I don't think we need `NOT_AARCH64_PORT_ONLY`, and it's too confusing. Otherwise OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From thartmann at openjdk.java.net Mon Feb 28 14:40:52 2022 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 28 Feb 2022 14:40:52 GMT Subject: RFR: 8279573: compiler/codecache/CodeCacheFullCountTest.java fails with "RuntimeException: the value of full_count is wrong." In-Reply-To: <9kpGtp-T1jcm8LYcqrFjUB_VDRth_YnpgdLrarSonSQ=.66e97845-c0a9-4e82-b3e9-464cdffb2c72@github.com> References: <9kpGtp-T1jcm8LYcqrFjUB_VDRth_YnpgdLrarSonSQ=.66e97845-c0a9-4e82-b3e9-464cdffb2c72@github.com> Message-ID: <09ehUnw153f_DG6GP1nODViodtRmeK6PIJ8O-K3WfiM=.ce8a5999-448c-4f0b-acba-49bba4fec11f@github.com> On Sat, 26 Feb 2022 13:14:57 GMT, Coleen Phillimore wrote: > This change adds a conditional to make -XX:-UseCodeCacheFlushing not flush the code cache so that the test passes on loom. It also makes full_count atomic so that the test in codeCache for printing is correct. This change also fixes the test because the full_count field and the message printing are not synchronized, so you can get 2 or more depending on the number of compiler threads. > Tested with tier1-3 on linux and windows x64. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7629 From duke at openjdk.java.net Mon Feb 28 14:57:48 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 28 Feb 2022 14:57:48 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 14:23:58 GMT, Andrew Haley wrote: >> 8282392: [zero] Build broken on AArch64 > > src/hotspot/share/utilities/macros.hpp line 543: > >> 541: #define AARCH64_PORT_ONLY(code) code >> 542: #define NOT_AARCH64_PORT_ONLY(code) >> 543: #else > > I don't think we need `NOT_AARCH64_PORT_ONLY`, and it's too confusing. Otherwise OK. Agreed. Only added it to keep with the style of the file. Will remove. ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From aph at openjdk.java.net Mon Feb 28 15:16:52 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 28 Feb 2022 15:16:52 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 12:28:39 GMT, Alan Hayward wrote: > 8282392: [zero] Build broken on AArch64 Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From duke at openjdk.java.net Mon Feb 28 16:05:19 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Mon, 28 Feb 2022 16:05:19 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v10] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix problem related to NMT The problem is that registering a thread for NMT uses the os::is_first_C_frame method which calls Thread::enable_wx internally. But enable_wx requires that the init_wx method has been called before, not after. Swapping two lines therefore fixes the problem. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/1f08203f..1714b69a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=08-09 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From shade at openjdk.java.net Mon Feb 28 16:21:37 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 28 Feb 2022 16:21:37 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 16:18:07 GMT, Alan Hayward wrote: >> 8282392: [zero] Build broken on AArch64 > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Remove NOT_AARCH64_PORT_ONLY I think it is confusing to have `AARCH64_PORT_ONLY` defines, to be honest. In the similar cases for X86, we just additionally protect these blocks with !ZERO. Something like: #if defined(AARCH64) && !defined(ZERO) ret_pc = pauth_strip_pointer(ret_pc); #endif ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From shade at openjdk.java.net Mon Feb 28 16:21:38 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 28 Feb 2022 16:21:38 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 12:28:39 GMT, Alan Hayward wrote: > 8282392: [zero] Build broken on AArch64 See for example: https://github.com/openjdk/jdk/blob/4e7fb41dafaf03baabe18ee1dabefed50d69e16d/src/hotspot/share/utilities/ticks.cpp#L66 ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From duke at openjdk.java.net Mon Feb 28 16:21:37 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 28 Feb 2022 16:21:37 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: > 8282392: [zero] Build broken on AArch64 Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: Remove NOT_AARCH64_PORT_ONLY ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7633/files - new: https://git.openjdk.java.net/jdk/pull/7633/files/d5952abc..edf11eae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7633&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7633&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7633.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7633/head:pull/7633 PR: https://git.openjdk.java.net/jdk/pull/7633 From chagedorn at openjdk.java.net Mon Feb 28 16:22:25 2022 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 28 Feb 2022 16:22:25 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v5] In-Reply-To: References: Message-ID: > When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method: > > Stack: [0x00007f6e01739000,0x00007f6e0183a000], sp=0x00007f6e01838110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f > > This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method. > > This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see [JDK-8185712](https://bugs.openjdk.java.net/browse/JDK-8185712)): > > Stack: [0x00007f34fca18000,0x00007f34fcb19000], sp=0x00007f34fcb17110, free space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x620d86] Compilation::~Compilation()+0x64 (c1_Compilation.cpp:607) > V [libjvm.so+0x624b92] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec (c1_Compiler.cpp:250) > V [libjvm.so+0x8303ef] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899 (compileBroker.cpp:2291) > V [libjvm.so+0x82f067] CompileBroker::compiler_thread_loop()+0x3df (compileBroker.cpp:1966) > V [libjvm.so+0x84f0d1] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69 (compilerThread.cpp:59) > V [libjvm.so+0x1209329] JavaThread::thread_main_inner()+0x15d (thread.cpp:1297) > V [libjvm.so+0x12091c9] JavaThread::run()+0x167 (thread.cpp:1280) > V [libjvm.so+0x1206ada] Thread::call_run()+0x180 (thread.cpp:358) > V [libjvm.so+0x1012e55] thread_native_entry(Thread*)+0x18f (os_linux.cpp:705) > > For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4. > > The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf > I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec. > > The comments at the `Dwarf` class in the `elf.hpp` file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the `elf.hpp` file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability. > > Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, `-Xlog:dwarf` can be used with `info`, `debug` or `trace` which provides logging messages throughout parsing. > > **Testing:** > Apart from manual testing, I've added two kinds of tests: > - A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers. > - Gtests: Directly calling the `get_source()` method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename. > > On top of that, there are also existing JTreg tests that call `-XX:NativeMemoryTracking=detail` which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation. > > To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional `_JVM_DWARF_PATH` environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some `make` changes). This is similar to what's done on Windows with `_NT_SYMBOL_PATH`. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number. > > I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user). > > Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for `make`! > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: - Updating some comments - Cleanup loading dwarf file and add summary - Review comments of first pass by Thomas except dwarf file loading - Merge branch 'master' into JDK-8242181 - Make dwarf tag NOT_PRODUCT - Change log_* to log_develop_* and log_warning to log_develop_info - Update test/hotspot/jtreg/runtime/ErrorHandling/TestDwarf.java Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> - Update test/hotspot/jtreg/runtime/ErrorHandling/TestDwarf.java Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> - Better formatting of trace output - some code move and more cleanups - ... and 44 more: https://git.openjdk.java.net/jdk/compare/efd3967b...5bea4841 ------------- Changes: https://git.openjdk.java.net/jdk/pull/7126/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7126&range=04 Stats: 2665 lines in 19 files changed: 2524 ins; 76 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/7126.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7126/head:pull/7126 PR: https://git.openjdk.java.net/jdk/pull/7126 From chagedorn at openjdk.java.net Mon Feb 28 16:22:31 2022 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 28 Feb 2022 16:22:31 GMT Subject: RFR: 8242181: [Linux] Show source information when printing native stack traces in hs_err files [v4] In-Reply-To: References: Message-ID: <5YeksetlUoja6cRgWtaorVsXCDLEQRw8s7B9W5UJOUE=.22f23a33-8877-4c3a-9b6d-f648fb1c4fc3@github.com> On Tue, 22 Feb 2022 09:59:36 GMT, Thomas Schatzl wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Make dwarf tag NOT_PRODUCT > > src/hotspot/share/utilities/elfFile.cpp line 319: > >> 317: } >> 318: log_develop_info(dwarf)("No separate .debuginfo file for library %s. It already contains the required DWARF sections.", _filepath); >> 319: _dwarf_file = new (std::nothrow) DwarfFile(_filepath); > > Would it be useful to explicitly bail out on a `nullptr` value here to avoid crashes below? Yes, I think that's the right way. I changed other allocations as well to bail out. > src/hotspot/share/utilities/elfFile.cpp line 357: > >> 355: } >> 356: >> 357: strcpy(debug_pathname, _filepath); > > I'm always a bit uneasy using "raw" `strcpy` instead of `strncpy` and friends. The code seems to be correct though. Yes that's true. I updated usages while introducing a new helper class `DwarfFilePath`. > src/hotspot/share/utilities/elfFile.cpp line 784: > >> 782: } >> 783: >> 784: if (!_reader.read_byte(&_header._address_size) || NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)) { > > Since this is the second time for the clause `|| NOT_LP64(_header._address_size != 4) LP64_ONLY( _header._address_size != 8)` maybe it is useful to make a constant out of the accepted address size somewhere instead of repeating this over and over. > It's value could even be something like `sizeof(intptr_t)` or so. I agree, I introduced a new constant `DwarfFile::ADDRESS_SIZE`. > src/hotspot/share/utilities/elfFile.cpp line 1070: > >> 1068: // reason, GCC is currently using version 3 as specified in the DWARF 3 spec for the line number program even though GCC should >> 1069: // be using version 4 for DWARF 4 as it emits DWARF 4 by default. >> 1070: return false; > > According to the specification (pg112): > >> `version (uhalf)` >> A version number (see Appendix F). This number is specific to the line number information >> and is independent of the DWARF version number. > > So this is just fine - actually things may break if the code accepted version 4 here assuming that there are breaking differences. > On the other hand Appendix F mentions that DWARF4 contains .debug_line information in version 4. The `LineNumberProgram` class should be able to handle both version 3 and 4. There are some differences (see `_dwarf_version` checks). But I found that GCC even mixes version 3 and 4: https://github.com/chhagedorn/jdk/blob/820f0da65ab06b28ac75eec96d35269addda0246/src/hotspot/share/utilities/elfFile.cpp#L1302-L1308 > src/hotspot/share/utilities/elfFile.hpp line 211: > >> 209: >> 210: // Load the DWARF file (.debuginfo) that belongs to this file. >> 211: bool load_dwarf_file(); > > It would be nice to summarize from which places this methods tries to load the debug info to prevent the need for digging for it in the method implementation. Good suggestion. I added a summary and refactored the different loading attempts into separate methods together with a new class `DwarfFilePath` which makes it easier to prepare the different paths. ------------- PR: https://git.openjdk.java.net/jdk/pull/7126 From duke at openjdk.java.net Mon Feb 28 16:23:34 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Mon, 28 Feb 2022 16:23:34 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v11] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix small issues ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/1714b69a..c8223a75 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=09-10 Stats: 35 lines in 5 files changed: 4 ins; 18 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From coleenp at openjdk.java.net Mon Feb 28 16:26:35 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 28 Feb 2022 16:26:35 GMT Subject: RFR: 8279573: compiler/codecache/CodeCacheFullCountTest.java fails with "RuntimeException: the value of full_count is wrong." [v2] In-Reply-To: <9kpGtp-T1jcm8LYcqrFjUB_VDRth_YnpgdLrarSonSQ=.66e97845-c0a9-4e82-b3e9-464cdffb2c72@github.com> References: <9kpGtp-T1jcm8LYcqrFjUB_VDRth_YnpgdLrarSonSQ=.66e97845-c0a9-4e82-b3e9-464cdffb2c72@github.com> Message-ID: <3Usf-CPfXE7q3-1QhVpOomY5LBNVtc9sr2iYbH1BWnQ=.6fdf0589-a203-41f7-8a31-119b8ad60edd@github.com> > This change adds a conditional to make -XX:-UseCodeCacheFlushing not flush the code cache so that the test passes on loom. It also makes full_count atomic so that the test in codeCache for printing is correct. This change also fixes the test because the full_count field and the message printing are not synchronized, so you can get 2 or more depending on the number of compiler threads. > Tested with tier1-3 on linux and windows x64. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: I misunderstood the UseCodeCacheFlushing flag and make it act like MethodFlushing, which is a whole different flag. Using MethodFlushing instead in the test makes it pass on loom and mainline. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7629/files - new: https://git.openjdk.java.net/jdk/pull/7629/files/7b790e07..03950bf0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7629&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7629&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7629.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7629/head:pull/7629 PR: https://git.openjdk.java.net/jdk/pull/7629 From duke at openjdk.java.net Mon Feb 28 16:28:27 2022 From: duke at openjdk.java.net (Johannes Bechberger) Date: Mon, 28 Feb 2022 16:28:27 GMT Subject: RFR: 8282306: os::is_first_C_frame(frame*) crashes on invalid link access [v12] In-Reply-To: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> References: <_oxztIwEWlkrlWHp2-w0-RHbm4iGxppT9zY8mcrKybE=.b4e356e3-2072-4c8c-94fe-41a62f4e48c8@github.com> Message-ID: > This PR introduces a new method `can_access_link` into the frame class to check the accessibility of the link information. It furthermore adds a new `os::is_first_C_frame(frame*, Thread*)` that uses the `can_access_link` method > and the passed thread object to check the validity of frame pointer, stack pointer, sender frame pointer and sender stack pointer. This should reduce the possibilities for crashes. Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix trailing whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/7591/files - new: https://git.openjdk.java.net/jdk/pull/7591/files/c8223a75..219837e3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=7591&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7591/head:pull/7591 PR: https://git.openjdk.java.net/jdk/pull/7591 From coleenp at openjdk.java.net Mon Feb 28 16:39:46 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 28 Feb 2022 16:39:46 GMT Subject: RFR: 8279573: compiler/codecache/CodeCacheFullCountTest.java fails with "RuntimeException: the value of full_count is wrong." [v2] In-Reply-To: <3Usf-CPfXE7q3-1QhVpOomY5LBNVtc9sr2iYbH1BWnQ=.6fdf0589-a203-41f7-8a31-119b8ad60edd@github.com> References: <9kpGtp-T1jcm8LYcqrFjUB_VDRth_YnpgdLrarSonSQ=.66e97845-c0a9-4e82-b3e9-464cdffb2c72@github.com> <3Usf-CPfXE7q3-1QhVpOomY5LBNVtc9sr2iYbH1BWnQ=.6fdf0589-a203-41f7-8a31-119b8ad60edd@github.com> Message-ID: On Mon, 28 Feb 2022 16:26:35 GMT, Coleen Phillimore wrote: >> This change adds a conditional to make -XX:-UseCodeCacheFlushing not flush the code cache so that the test passes on loom. It also makes full_count atomic so that the test in codeCache for printing is correct. This change also fixes the test because the full_count field and the message printing are not synchronized, so you can get 2 or more depending on the number of compiler threads. >> Tested with tier1-3 on linux and windows x64. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > I misunderstood the UseCodeCacheFlushing flag and make it act like MethodFlushing, which is a whole different flag. Using MethodFlushing instead in the test makes it pass on loom and mainline. Thanks Tobias. Erik asked me off PR why this UseCodeCacheFlushing flag didn't disable the NMethodSweeper completely, since I made it disable flushing methods. Which made me aware of another flag that does what this test should want: product(bool, MethodFlushing, true, \ "Reclamation of zombie and not-entrant methods") \ vs. product(bool, UseCodeCacheFlushing, true, \ "Remove cold/old nmethods from the code cache") \ The latter flag disables removing cold methods from the code cache, where the former disables flushing. I fixed the test to use MethodFlushing instead and verified that it passes on Loom and mainline. ------------- PR: https://git.openjdk.java.net/jdk/pull/7629 From aph at openjdk.java.net Mon Feb 28 16:42:54 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 28 Feb 2022 16:42:54 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 16:16:12 GMT, Aleksey Shipilev wrote: > I think it is confusing to have `AARCH64_PORT_ONLY` defines, to be honest. In the similar cases for X86, we just additionally protect these blocks with !ZERO. Something like: That's what we looked at and it was more of a mess, IMO. In the end it's a judgment call which to have, and I've seen this kind of mistake, where a particular port is confused with a particular CPU, enough times that I think this is OK; YMMV. ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From pchilanomate at openjdk.java.net Mon Feb 28 16:49:49 2022 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Mon, 28 Feb 2022 16:49:49 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v4] In-Reply-To: <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> Message-ID: On Sat, 26 Feb 2022 13:12:29 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix CDS ommission. Adding that test case was a good idea :) Looks good! Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7608 From shade at openjdk.java.net Mon Feb 28 16:51:47 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 28 Feb 2022 16:51:47 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 16:39:48 GMT, Andrew Haley wrote: > That's what we looked at and it was more of a mess, IMO. In the end it's a judgment call which to have, and I've seen this kind of mistake, where a particular port is confused with a particular CPU, enough times that I think this is OK; YMMV. >From the perspective of Zero maintenance, having the Zero-specific workarounds explicitly doing `!ZERO` is cleaner. This mess is mostly Zero-s problem with idenitifying itself as CPU. So, in my mind, there is little reason to accommodate that problem with "port" defines. ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From shade at openjdk.java.net Mon Feb 28 17:40:52 2022 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 28 Feb 2022 17:40:52 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 16:21:37 GMT, Alan Hayward wrote: >> 8282392: [zero] Build broken on AArch64 > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Remove NOT_AARCH64_PORT_ONLY Fine, let's do it in this form. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7633 From aph at openjdk.java.net Mon Feb 28 17:40:52 2022 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 28 Feb 2022 17:40:52 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: <_IW-vmjBMki32oD_xhjLWurDk5CGtAH0o1oWcS1-tsA=.11bfefec-625c-4c22-92a7-f12bc32d9fa3@github.com> On Mon, 28 Feb 2022 16:48:35 GMT, Aleksey Shipilev wrote: > > That's what we looked at and it was more of a mess, IMO. In the end it's a judgment call which to have, and I've seen this kind of mistake, where a particular port is confused with a particular CPU, enough times that I think this is OK; YMMV. > > From the perspective of Zero maintenance, having the Zero-specific workarounds explicitly doing `!ZERO` is cleaner. This mess is mostly Zero-s problem with idenitifying itself as CPU. So, in my mind, there is little reason to accommodate that problem with "port" defines. I think I understand your point, but IMO it's almost always easier to understand language which says what something is than what it isn't, and a simple name than a boolean expression. And that is more important, I believe. Having said that, if you insist that flagging this up as a Zero-specific workaround with `!ZERO` is really important I will give way to your preference. (I don't think it is: I think we should flag this code as port-specific, not CPU-specific. But mostly I just want this patch pushed.) ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From duke at openjdk.java.net Mon Feb 28 17:40:52 2022 From: duke at openjdk.java.net (Alan Hayward) Date: Mon, 28 Feb 2022 17:40:52 GMT Subject: RFR: 8282392: [zero] Build broken on AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 28 Feb 2022 16:21:37 GMT, Alan Hayward wrote: >> 8282392: [zero] Build broken on AArch64 > > Alan Hayward has updated the pull request incrementally with one additional commit since the last revision: > > Remove NOT_AARCH64_PORT_ONLY My only issue with that is that: AARCH64_PORT_ONLY(some_function()); becomes: #if defined(AARCH64) && !defined(ZERO) some_function(); #endif Which is a little uglier. How about defining the macro something like: #if defined(AARCH64) && !defined(ZERO) #define AARCH64_NOT_ZERO(code) code (ultimately, I'm happy with any of the above) ------------- PR: https://git.openjdk.java.net/jdk/pull/7633 From duke at openjdk.java.net Mon Feb 28 18:52:18 2022 From: duke at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 28 Feb 2022 18:52:18 GMT Subject: RFR: 8280872: Reorder code cache segments to improve code density [v2] In-Reply-To: References: Message-ID: <6yR77yO0CGw6ciJPa97cS0O3PCsWznBy9x0x6ILWLZc=.43ad49ab-4ad0-49d9-9098-da4fef38dabf@github.com> On Wed, 23 Feb 2022 21:52:11 GMT, Boris Ulasevich wrote: >> Currently the codecache segment order is [non-nmethod, non-profiled, profiled]. With this change we move the non-nmethod segment between two code segments. It changes nothing for any platform besides AARCH. >> >> In AARCH the offset limit for a branch instruction is 128MB. The bigger jumps are encoded with three instructions. Most of far branches are jumps into the non-nmethod blobs. With the non-nmethod segment in between code segments the jump distance from method to the stub becomes shorter. The result is a 4% reduction in generated code size for the CodeCache range from 128MB to 240MB. >> >> As a side effect, the performance of some tests is slightly improved: >> ``ArraysFill.testCharFill 10 thrpt 15 170235.720 -> 178477.212 ops/ms`` >> >> Testing: jdk/hotspot jtreg and microbenchmarks on AMD and AARCH > > Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - fix name: is_non_nmethod, adding target_needs_far_branch func > - change codecache segments order: nonprofiled-nonmethod-profiled > increase far jump threshold: sideof(codecache)=128M -> sizeof(nonprofiled+nonmethod)=128M src/hotspot/cpu/aarch64/icBuffer_aarch64.cpp line 55: > 53: Label l; > 54: __ ldr(rscratch2, l); > 55: __ far_jump(ExternalAddress(entry_point), NULL, rscratch1, true); This complicates `assemble_ic_buffer_code`. You need to know `far_jump` implementation, especially the generation of NOPs. I understand why we need those NOPs. Do we have calls of non-nmethod code here? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 393: > 391: assert(CodeCache::find_blob(entry.target()) != NULL, > 392: "destination of far call not found in code cache"); > 393: assert(CodeCache::is_non_nmethod(entry.target()), "must be a call to the code stub"); This restricts far calls to be calls of non-nmethod code. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4379: > 4377: postcond(pc() == badAddress); > 4378: return NULL; > 4379: } I believe replacing `trampoline_call` by `far_call` should be a separate PR. src/hotspot/cpu/aarch64/nativeInst_aarch64.cpp line 533: > 531: address stub = NULL; > 532: > 533: if (a.codecache_branch_needs_far_jump() I prefer it to be `a.target_needs_far_jump(dest)`. `codecache_branch` looks like code cache branches need far jumps. It is strange because the code cache is just a storage. It is the code generator has to use far jumps. src/hotspot/share/code/codeCache.cpp line 898: > 896: } > 897: > 898: size_t CodeCache::max_distance_to_codestub() { `max_distance_to_non_nmethod_heap`? As this is public API, it sounds strange without the start point. If someone changes positions of the heap, would it work as expected? ------------- PR: https://git.openjdk.java.net/jdk/pull/7517 From hseigel at openjdk.java.net Mon Feb 28 20:35:21 2022 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 28 Feb 2022 20:35:21 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v4] In-Reply-To: <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> Message-ID: On Sat, 26 Feb 2022 13:12:29 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix CDS ommission. Looks good! Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7608 From coleenp at openjdk.java.net Mon Feb 28 20:35:21 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 28 Feb 2022 20:35:21 GMT Subject: RFR: 8282240: Add _name field to Method for NOT_PRODUCT only [v4] In-Reply-To: <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> <4x8eIKVaHBOOreUjGmLKZWfrPj6hTOJmj4zWSktdUik=.78c30107-46ec-4dd0-be30-7eb6bc8f01d1@github.com> Message-ID: On Sat, 26 Feb 2022 13:12:29 GMT, Coleen Phillimore wrote: >> Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. >> Tested with tier1 on Oracle platforms. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix CDS ommission. Thanks Harold and Patricio! ------------- PR: https://git.openjdk.java.net/jdk/pull/7608 From coleenp at openjdk.java.net Mon Feb 28 20:35:21 2022 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 28 Feb 2022 20:35:21 GMT Subject: Integrated: 8282240: Add _name field to Method for NOT_PRODUCT only In-Reply-To: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> References: <-earTaon4tAWa42gIN_zQGm297N0MCypdcEyaBGY9CE=.69d09b35-1b7c-4428-b32a-9e7a3bee5aea@github.com> Message-ID: On Thu, 24 Feb 2022 12:50:01 GMT, Coleen Phillimore wrote: > Whenever I'm debugging I really wish I knew the name of the method that I'm looking at, so I added this field in not-product. > Tested with tier1 on Oracle platforms. This pull request has now been integrated. Changeset: c7cd1487 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/c7cd1487fe00172be59e7571991f960c59b8c0eb Stats: 42 lines in 5 files changed: 33 ins; 0 del; 9 mod 8282240: Add _name field to Method for NOT_PRODUCT only Reviewed-by: pchilanomate, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/7608 From iklam at openjdk.java.net Mon Feb 28 20:38:24 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 28 Feb 2022 20:38:24 GMT Subject: RFR: 8275731: CDS archived enums objects are recreated at runtime [v4] In-Reply-To: References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: On Thu, 17 Feb 2022 23:20:41 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Use InstanceKlass::do_local_static_fields for some field iterations > > Looks good. Minor comment below. > Also, several files with copyright year 2021 need updating. Thanks @calvinccheung and @coleenp for the review. Passed tiers 1-5. ------------- PR: https://git.openjdk.java.net/jdk/pull/6653 From iklam at openjdk.java.net Mon Feb 28 20:38:24 2022 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 28 Feb 2022 20:38:24 GMT Subject: Integrated: 8275731: CDS archived enums objects are recreated at runtime In-Reply-To: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> References: <9XdQFi_-JzM91ET0nN1gRCp8ZfMGBz1BwXglxqb8phg=.c643d5a5-b99a-4ce2-8616-9c1472e521b7@github.com> Message-ID: On Wed, 1 Dec 2021 20:47:20 GMT, Ioi Lam wrote: > **Background:** > > In the Java Language, Enums can be tested for equality, so the constants in an Enum type must be unique. Javac compiles an enum declaration like this: > > > public enum Day { SUNDAY, MONDAY ... } > > > to > > > public class Day extends java.lang.Enum { > public static final SUNDAY = new Day("SUNDAY"); > public static final MONDAY = new Day("MONDAY"); ... > } > > > With CDS archived heap objects, `Day::` is executed twice: once during `java -Xshare:dump`, and once during normal JVM execution. If the archived heap objects references one of the Enum constants created at dump time, we will violate the uniqueness requirements of the Enum constants at runtime. See the test case in the description of [JDK-8275731](https://bugs.openjdk.java.net/browse/JDK-8275731) > > **Fix:** > > During -Xshare:dump, if we discovered that an Enum constant of type X is archived, we archive all constants of type X. At Runtime, type X will skip the normal execution of `X::`. Instead, we run `HeapShared::initialize_enum_klass()` to retrieve all the constants of X that were saved at dump time. > > This is safe as we know that `X::` has no observable side effect -- it only creates the constants of type X, as well as the synthetic value `X::$VALUES`, which cannot be observed until X is fully initialized. > > **Verification:** > > To avoid future problems, I added a new tool, CDSHeapVerifier, to look for similar problems where the archived heap objects reference a static field that may be recreated at runtime. There are some manual steps involved, but I analyzed the potential problems found by the tool are they are all safe (after the current bug is fixed). See cdsHeapVerifier.cpp for gory details. An example trace of this tool can be found at https://bugs.openjdk.java.net/secure/attachment/97242/enum_warning.txt > > **Testing:** > > Passed Oracle CI tiers 1-4. WIll run tier 5 as well. This pull request has now been integrated. Changeset: d983d108 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/d983d108c565654e717e2811d88aa94d982da2f5 Stats: 860 lines in 16 files changed: 807 ins; 4 del; 49 mod 8275731: CDS archived enums objects are recreated at runtime Reviewed-by: coleenp, ccheung ------------- PR: https://git.openjdk.java.net/jdk/pull/6653 From dholmes at openjdk.java.net Mon Feb 28 23:35:03 2022 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 28 Feb 2022 23:35:03 GMT Subject: RFR: 8227369: pd_disjoint_words_atomic() needs to be atomic [v2] In-Reply-To: References: <5VWTTzHHgW3zN3B7ANKTF4_wjp5FEYlrXucH0Shx_Ig=.f3291823-90c1-4e61-8e21-916e664cd5a2@github.com> Message-ID: <1HhKKfAKeHpy1WepMt3d0Zh2Gn21hvRaK8yJawdGbr8=.591981dc-f018-4419-84d9-df10cf211f3a@github.com> On Mon, 28 Feb 2022 12:49:25 GMT, Aleksey Shipilev wrote: >> I ran some GC benchmarks which turned out to be just specjbb2005 and specjvm2008-*. >> >> There were two regressions flagged: >> >> Linux-x64: SPECjvm2008-LU.large-ZGC -5.82% >> macos-x64: SPECjvm2008-Serial-ParGC -4.16% >> >> However, Erik thinks these are just noise as apparently ZGC doesn't use these atomic copy routines, nor does he think ParGC does either. >> >> Thoughts? > >> I ran some GC benchmarks which turned out to be just specjbb2005 and specjvm2008-*. >> >> There were two regressions flagged: >> >> Linux-x64: SPECjvm2008-LU.large-ZGC -5.82% >> macos-x64: SPECjvm2008-Serial-ParGC -4.16% > > Myself, I never trust LU.large results, since they experience quite large run-to-run variance in our runs. Serial regression is weird, though, it is usually a very stable workload. Does it reproduce locally? If it does not reproduce, we can go ahead and deal with any regressions later. @shipilev I can't run these benchmarks "locally" (I don't have the benchmarks nor a macOS system). I will try submitting another run just for that benchmark. ------------- PR: https://git.openjdk.java.net/jdk/pull/7567